Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python string.letters does not include locale diacritics

I am trying to get the alphabet from python string module depending on a given locale with no success (that is with the diacritics, i.e. éèêà... for French). Here is a minimal example :

import locale, string

locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
print string.letters
# shows ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz

locale.setlocale(locale.LC_ALL, 'fr_FR.UTF-8')
print string.letters
# also shows ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz

In the python documentation, it is said that string.letters is locale dependent, but it seems that it does not work for me.

What I am doing wrong and is it the right way to obtain a language-dependent alphabet ?

Edit: I just checked the locale print locale.getlocale() after setting and it is correctly changed.

like image 962
F. Boudin Avatar asked Oct 17 '22 23:10

F. Boudin


1 Answers

In python 2.7 (there is no string.letters in python 3.x) it works if you set the locale to 'fr_FR' (equivalent to 'fr_FR.ISO8859-1', not 'fr_FR.UTF-8').

>>> import locale, string
>>> locale.setlocale(locale.LC_ALL, 'es_ES')
'es_ES'
>>> string.letters
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz\xaa\xb5\xba\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
>>> locale.setlocale(locale.LC_ALL, 'es_ES.UTF-8')
'es_ES.UTF-8'
>>> string.letters
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'

So \xaa is character "ª", \xab "«", \xd1 is "Ñ" and so on. But the encoding representation is indeed broken.

I do highly recommend reading this: https://pythonhosted.org/kitchen/unicode-frustrations.html

like image 160
Alberto Avatar answered Nov 03 '22 07:11

Alberto