Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Word wrapping in pango with mixed scripts

I have a text box implementation that uses pango. If i put a string that starts with a word in right-to-left script, followed by a space, followed by word in left-to-right based script, the word wrapping that pango uses gets messed up (using PANGO_WRAP_WORD_CHAR). For the string العربية ENGLISH I get the following:

Bad word wrapping

If I add the unicode character U+200F after the space, then I get the expected word wrapping:

Expected word wrapping

Also, if I replace the Arabic script above with Hindi (which is left-to-right like the English next to it) then I still get the problem, so it doesn't seem to be a strictly left-to-right, right-to-left thing. In the Hindi case, I put in a hack that inserts a 0x200E after the space it resolves the problem.

Is this a bug in pango? Are there work-arounds I can try that are generic enough to fix the problem but not break other cases? The current work around I'm using inserts either a 0x200E or 0x200F after every space based on the direction of the previous strongly directed character in the string, but I'm not sure if there's certain strings that this will cause problems with.

Update: I was able to reproduce this problem on Ubuntu 12.04 with gedit (with Enable Text Wrapping and Do no split words over two lines settings enabled). I simply typed Hello world over and over until it wrapped several times, then replaced all instances of world with पहुंचगया, and everything collapsed to a single line.

like image 546
default Avatar asked Dec 09 '15 18:12

default


3 Answers

The symbols U+200F and U+200E are RIGHT-TO-LEFT and LEFT-TO-RIGHT Marks. S:

  • between each english text and arabic text, put a RIGHT-TO-LEFT mark
  • between each arabic text and english text, put a LEFT-TO-RIGHT mark

It is a bug because Pango should this automatically in viewing text but as Pango isnt doing it, you should do it manually.

like image 166
Assem Avatar answered Nov 08 '22 09:11

Assem


It seems to me a bug or not complete feature as it appears on mixed scripts.

Seem to me you are using an old pango development, may be from Ubuntu 12.04?

Ubuntu 12.04 contains Gedit 3.4
Ubuntu 15.10 contains Gedit 3.10

Pango has radical change in 3.6, it has replaced his shaping engine with HarfBuzz. [2]

I couldn't reproduce the bug using Gedit 15.10, it always moves (2) two words down, also it does not allow me to resize its window to try splitting those two words. See screen-shot.

pango shaping mixed scripts in gedit

Update:

It seems its behavior has changed:

  • It does not wrap the 1st word from English script when start with Arabic.

    pango-view  --text "وقعت أطراف سياسية ليبية اليوم في المغرب اتفاق سلام برعاية أممية aljazeeranet" --width=70 --margin=0 --wrap=word 
    

    enter image description here

  • It same as previous case, does not wrap, and enforce the width

    pango-view  --text "elections الجزيرة" --width=30 --margin=0 --wrap=word
    

    enter image description here

References:

  • Port pango to Harfbuzz
like image 20
user.dz Avatar answered Nov 08 '22 09:11

user.dz


Note, we recently upgraded the version of pango we used, from pango version 1.36.1 to 1.38.1, and this issue went away. So I believe this was a bug in pango or harfbuzz that has since been fixed.

like image 2
default Avatar answered Nov 08 '22 07:11

default