I know unicode contains all characters from most world aphabets..but what about digits? Are they part of unicode or not? I was not able to find straight answer. Thanks
A numeral (often called number in Unicode) is a character that denotes a number. The decimal number digits 0–9 are used widely in various writing systems throughout the world, however the graphemes representing the decimal digits differ widely.
A single number is assigned to each code element defined by the Unicode Standard. Each of these numbers is called a code point and, when referred to in text, is listed in hexadecimal form following the prefix "U+". For example, the code point U+0041 is the hexadecimal number 0041 (equal to the decimal number 65).
As already stated, Indo-Arabic numerals (0,1,..,9) are included in Unicode, inherited from ASCII. If you're talking about representation of numbers in other languages, the answer is still yes, they are also part of Unicode.
//numbers (0-9) in Malayalam (language spoken in Kerala, India)
൦ ൧ ൨ ൩ ൪ ൫ ൬ ൭ ൮ ൯
//numbers (0-9) in Hindi (India's national language)
० १ २ ३ ४ ५ ६ ७ ८ ९
You can use \p{N}
or \p{Number}
in a regular expression to match any kind of numeric character in any script.
This document (Page-3) describes the Unicode code points for Malayalam digits.
In short: yes, of course. There are three categories in UNICODE containing various representations of digits and numbers:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With