Given a unicode character what would be the simplest way to return its script (as "Latin", "Hangul" etc)? unicodedata doesn't seem to provide this kind of feature.
To insert a Unicode character, type the character code, press ALT, and then press X. For example, to type a dollar symbol ($), type 0024, press ALT, and then press X.
In Python, the built-in functions chr() and ord() are used to convert between Unicode code points and characters. A character can also be represented by writing a hexadecimal Unicode code point with \x , \u , or \U in a string literal.
Q: How many characters are in Unicode? The short answer is that as of Version 15.0, the Unicode Standard contains 149,186 characters.
The Unicode character encoding standard is a fixed-length, character encoding scheme that includes characters from almost all of the living languages of the world. Information about Unicode can be found in The Unicode Standard , and from the Unicode Consortium website at www.unicode.org.
I was hoping someone's done it before, but apparently not, so here's what I've ended up with. The module below (I call it unicodedata2
) extends unicodedata
and provides script_cat(chr)
which returns a tuple (Script name, Category) for a unicode char. Example:
# coding=utf8
import unicodedata2
print unicodedata2.script_cat(u'Ф') #('Cyrillic', 'L')
print unicodedata2.script_cat(u'の') #('Hiragana', 'Lo')
print unicodedata2.script_cat(u'★') #('Common', 'So')
The module: https://gist.github.com/2204527
It seems to me that the Python unicodedata module contains tools for accessing the main file in the Unicode database but nothing for the other files: “The data in this database is based on the UnicodeData.txt file”
The script information is in the Scripts.txt file. It is of relatively simple format (described in UAX #44) and not horribly large (131 kilobytes), so you might consider parsing it in your program. Note that in the Unicode classification, there’s the “Common” script that contains characters used in different scripts, like punctuation marks.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With