I have a text file with a single line that contains the single Yiddish word azoy, in Hebrew script: אַזױ. Then I grep for occurrences of the oy character, ױ. (This is highly simplified of course, in reality it’s not so trivial.)
By default, grep (I run it under Lubuntu 23.10) colors its results. When I disable that, everything works fine: grep --color=never ױ filewithazoy correctly finds and displays: אַזױ But when I do not disable result coloring, the oy character is displayed correctly in red, but IN THE WRONG order: ױאַז
I suppose this is caused by the escape sequences for rendering the colors: they contain an m and a K. Apparently Unicode’s birectional algorithm is applied BEFORE interpreting the escape sequences, so the presence of Latin characters messes up the order of the Hebrew characters. I think it should be the other way round: render the colors from the escape sequences, and THEN apply the directional algorithm on the Hebrew only result. But that’s probably easier said than done.
What I tried, without success:
- Install a he_IL locale, and activate it for qterminal and bash.
- The same, but with not only LC_ALL, but also LANG and LANGUAGE set to Hebrew.
- Run grep with color=always, so it sends coloring also into a pipe. I wrote a little C program that adds Unicode characters 200f (right to left mark) before and after the line.
- The same, but that last one just before the newline, not after it.
- The same with a 202e (right-to-left override) at the beginning of the line.
Nothing worked for me.
How do people in Israel do this? Or those working with Yiddish in New York etc.?