Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

An equivalent to string.ascii_letters for unicode strings in python 2.x?

In the "string" module of the standard library,

string.ascii_letters ## Same as string.ascii_lowercase + string.ascii_uppercase

is

'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

Is there a similar constant which would include everything that is considered a letter in unicode?

like image 475
emm Avatar asked Jan 24 '10 09:01

emm


1 Answers

You can construct your own constant of Unicode upper and lower case letters with:

import unicodedata as ud
all_unicode = ''.join(unichr(i) for i in xrange(65536))
unicode_letters = ''.join(c for c in all_unicode
                          if ud.category(c)=='Lu' or ud.category(c)=='Ll')

This makes a string 2153 characters long (narrow Unicode Python build). For code like letter in unicode_letters it would be faster to use a set instead:

unicode_letters = set(unicode_letters)
like image 64
Mark Tolonen Avatar answered Oct 18 '22 19:10

Mark Tolonen