Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling of arabic characters in unicode

Tags:

When does unicode know when to read from right to left or left to right.

Both in word and in python.

For example,

هذا هو الملعون جيد رجل الصباح!

If you were to hit backspace, it would be read from right to left.

I printed the unicode bytes representation which is

u'\u0647\u0630\u0627 \u0647\u0648 \u0627\u0644\u0645\u0644\u0639\u0648\u0646 \u062c\u064a\u062f \u0631\u062c\u0644 \u0627\u0644\u0635\u0628\u0627\u062d!'

But i did not see anything signifying left to right or right to left.

For normal strings like,

Hi how are you, it works from left to right. 

Shouldnt there be a unicode character or byte to signify left to right or something ?

like image 757
aceminer Avatar asked Apr 03 '16 04:04

aceminer


People also ask

Is Arabic in Unicode?

Arabic is a Unicode block, containing the standard letters and the most common diacritics of the Arabic script, and the Arabic-Indic digits.

Does UTF-8 include Arabic?

UTF-8 also includes a variety of additional international characters, such as Chinese characters and Arabic characters.

How do you write Allah in Unicode?

“ﷲ” U+FDF2 Arabic Ligature Allah Isolated Form Unicode Character.


1 Answers

The writing direction is a property of each Unicode character. Unicode contains a complex set of properties for each code point (whether it's e.g. a number, a mathematical symbol, whether it is alphabetic, its case, directionality, which code block it's in - which indirectly defines the script - etc).

For instance, see http://www.fileformat.info/info/unicode/char/0647/index.htm (this is the first character in your example) which includes the bidi (bidirectionality) property [AL] - this encodes "right-to-left Arabic" as the writing direction for this glyph.

There are Unicode characters which explicitly set the direction of the text, but these should not normally be required or useful. The font renderer should already know, for each character it renders, from its Unicode properties, which direction it requires (though text converted from other legacy encodings may still contain explicit direction indicator codes).

like image 119
tripleee Avatar answered Sep 28 '22 03:09

tripleee