Given a unicode character what would be the simplest way to return its script (as "Latin", "Hangul" etc)? unicodedata doesn't seem to provide this kind of feature.

I was hoping someone's done it before, but apparently not, so here's what I've ended up with. The module below (I call it <code>unicodedata2</code>) extends <code>unicodedata</code> and provides <code>script_cat(chr)</code> which returns a tuple (Script name, Category) for a unicode char. Example: <pre class="prettyprint"><code># coding=utf8 import unicodedata2 print unicodedata2.script_cat(u'Ф') #('Cyrillic', 'L') print unicodedata2.script_cat(u'の') #('Hiragana', 'Lo') print unicodedata2.script_cat(u'★') #('Common', 'So') </code></pre> The module: https://gist.github.com/2204527

Find out the unicode script of a character

2 Answers

I was hoping someone's done it before, but apparently not, so here's what I've ended up with. The module below (I call it unicodedata2) extends unicodedata and provides script_cat(chr) which returns a tuple (Script name, Category) for a unicode char. Example:

# coding=utf8
import unicodedata2
print unicodedata2.script_cat(u'Ф')  #('Cyrillic', 'L')
print unicodedata2.script_cat(u'の')  #('Hiragana', 'Lo')
print unicodedata2.script_cat(u'★')  #('Common', 'So')

The module: https://gist.github.com/2204527

answered Oct 08 '22 06:10

georg

It seems to me that the Python unicodedata module contains tools for accessing the main file in the Unicode database but nothing for the other files: “The data in this database is based on the UnicodeData.txt file”

The script information is in the Scripts.txt file. It is of relatively simple format (described in UAX #44) and not horribly large (131 kilobytes), so you might consider parsing it in your program. Note that in the Unicode classification, there’s the “Common” script that contains characters used in different scripts, like punctuation marks.

answered Oct 08 '22 08:10

Jukka K. Korpela

Related questions
                            
                                Is it possible to include a library like lxml without installing it?
                            
                                Optparse callback not consuming argument
                            
                                Python PIL: How to FIll a Image with a copyright logo like this?
                            
                                search for before and after values in a long sorted list
                            
                                Python: Map float range [0.0, 1.0] to color range [red, green]?
                            
                                #!/usr/bin/env python: Getting command not found and Permission Denied
                            
                                How to launch multiple other python scripts all together from one and send them arguments?
                            
                                Accessing UtcTimeStamp from Python via SWIG
                            
                                pyQt4 QGraphicsView on mouse event help needed
                            
                                Getting EOFError along with exceptions when using ftplib
                            
                                override recursive method in python
                            
                                How to display a window on a secondary display in PyQT?
                            
                                AppEngine - When to use a parent relationship?
                            
                                Python save serialization that correctly handles str/unicode?
                            
                                Are django signals also included inside of the transaction.atomic decorator?
                            
                                Generate a String that matches a RegEx in Python [duplicate]
                            
                                Sans-serif math with latex in matplotlib
                            
                                Numpy python find minimum value of each column and subtract this value from each column
                            
                                How to read UTF-8 files with Pandas?
                            
                                Passing command Line argument to Python script within Eclipse(Pydev)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Find out the unicode script of a character

Tags:

python

unicode

georg

People also ask

2 Answers

georg

Jukka K. Korpela

Recent Activity

Donate For Us