Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Arabic: 'source' Unicode to final display Unicode

Tags:

c++

c

arabic

simple question:

this is the final display string I am looking for

لعبة ديدة

now below is each of the separate characters, before being 'glued' together (so I've put a space between each of them to stop the joining)

ل ع ب ة د ي د ة

note how they are NOT the same characters, there is some magical transform that melds them together and converts them to new Unicode characters.

and then in that above, the characters are actually appearing right to left (in memory, they are left to right)

so my simple question is this: where do I get a platform independent c/c++ function that will take my source 16 bit Unicode string, and do the transform on it to result in the Unicode string that will create the one first quoted above? doing the RTL conversion, and the joining?

that's all I want, one function that does that.

UPDATE:

ok, yes, I know that the 'characters' are the same in the two above examples, they are the same 'letters' but (viewing in chrome, or latest IE) anyone can CLEARLY see that the glyphs are different. now I'm fairly confident that this transform that needs to be done can be done on the unicode level, because my font file, and the unicode standard, seems to specify the different glyphs for both the separate, and various joined versions of the characters/letters. (unicode.org/charts/PDF/UFB50.pdf unicode.org/charts/PDF/UFE70.pdf)

so, can I just put my unicode into a function and get the transformed unicode out?

like image 526
matt Avatar asked Oct 18 '11 07:10

matt


People also ask

Does Unicode support Arabic?

Arabic is a Unicode block, containing the standard letters and the most common diacritics of the Arabic script, and the Arabic-Indic digits.

Is Arabic supported in UTF 8?

UTF-8 can store the full Unicode range, so it's fine to use for Arabic.

What is Arabic ligature?

Arabic Presentation Forms-A has a few characters defined as "word ligatures" for terms frequently used in formulaic expressions in Arabic. They are rarely used out of professional liturgical typing, also the Rial grapheme is normally written fully, not by the ligature.


2 Answers

The joining and RTL conversion don't happen at the level of Unicode characters.

In other words: the order of the characters and the actual unicode codepoints are not changed during this process.

In fact, the merging and handling RTL/LTR transitions is handled by the text rendering engine.

This quote from the Wikipedia article on the Arabic alphabet explains it quite nicely:

Finally, the Unicode encoding of Arabic is in logical order, that is, the characters are entered, and stored in computer memory, in the order that they are written and pronounced without worrying about the direction in which they will be displayed on paper or on the screen. Again, it is left to the rendering engine to present the characters in the correct direction, using Unicode's bi-directional text features. In this regard, if the Arabic words on this page are written left to right, it is an indication that the Unicode rendering engine used to display them is out-of-date.

like image 50
Joachim Sauer Avatar answered Sep 26 '22 03:09

Joachim Sauer


The processing you're looking for is called ligature. Unlike many latin-based languages, where you can simply put one character after another to render the text, ligatures are fundamental in arabic. The substitution is done in the text rendering engine, and the ligature infos are generally stored in font files.

note how they are NOT the same characters

They are the same for an Arabic reader. It is still readable. There is no transform to do on your Unicode16 source text. You must provide the whole string to your text renderer. In C/C++, and as you are going the platform independent way, you can use Pango for rendering.

Note : Perhaps you wanted to write لعبة جديدة (i.e. new game) ? Because what you give as an example has no meaning in Arabic.

like image 21
overcoder Avatar answered Sep 22 '22 03:09

overcoder