I was reading the specification of Unicode @ Wikipedia (Arabic Unicode) and I see that each of the Arabic digits has 2 Unicode code points. For example 1 is defined as U+0661 and as U+06F1.
Which one should I use?
The numerals used in the middle east today are not those which gave rise to "arabic" numerals used throughout the world. The origin of the numerals familiar to us today is the western arabic world of Andalusia/Morocco.
Unicode has a number of characters specifically designated as Roman numerals, as part of the Number Forms range from U+2160 to U+2188. This range includes both upper- and lowercase numerals, as well as pre-combined characters for numbers up to 12 (Ⅻ or XII).
Western nations call them Arabic because Europe got the numerals from the Islamic world, which got them from the Hindus. (People used to pay less attention to the subtleties of multiculturalism.)
The Hindu-Arabic or Indo-Arabic numerals were invented by mathematicians in India. Persian and Arabic mathematicians called them "Hindu numerals". Later they came to be called "Arabic numerals" in Europe because they were introduced to the West by Arab merchants.
According to the code charts, U+0660 .. U+0669 are ARABIC-INDIC DIGIT values 0 through 9, while U+06F0 .. U+06F9 are EXTENDED ARABIC-INDIC DIGIT values 0 through 9.
In the Unicode 3.0 book (5.2 is the current version, but these things don't change much once set), the U+066n series of glyphs are marked 'Arabic-Indic digits' and the U+06Fn series of glyphs are marked 'Eastern Arabic-Indic digits (Persian and Urdu)'. It also notes:
For comparison:
Or, enlarged by making the information into a title:
Or:
U+066n U+06Fn 0 ٠ ۰ 1 ١ ۱ 2 ٢ ۲ 3 ٣ ۳ 4 ٤ ۴ 5 ٥ ۵ 6 ٦ ۶ 7 ٧ ۷ 8 ٨ ۸ 9 ٩ ۹
(Whether you can see any of those, and how clearly they are differentiated may depend on your browser and the fonts installed on your machine as much as anything else. I can see the difference on 4 and 6 clearly; 5 looks much the same in both.)
Based on this information, if you are working with Arabic from the Middle East, use the U+066n series of digits; if you are working with Persian or Urdu, use the U+06Fn series of digits. As a Unicode application, you should accept either set of codes as valid digits (but you might look askance at a sequence that mixed the two sets of digits - or you might just leave well alone).
In general you should not hard-code such info in your application.
There are Arabic countries that don't use the Arabic-Indic digits by default. So there is no direct mapping saying Arabic -> Arabic-Indic digits.
And the user might have changed the defaults in the Control Panel anyway.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With