I am trying to get the alphabet from python string module depending on a given locale with no success (that is with the diacritics, i.e. éèêà... for French). Here is a minimal example :
import locale, string
locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
print string.letters
# shows ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
locale.setlocale(locale.LC_ALL, 'fr_FR.UTF-8')
print string.letters
# also shows ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
In the python documentation, it is said that string.letters is locale dependent, but it seems that it does not work for me.
What I am doing wrong and is it the right way to obtain a language-dependent alphabet ?
Edit: I just checked the locale print locale.getlocale()
after setting and it is correctly changed.
In python 2.7 (there is no string.letters in python 3.x) it works if you set the locale to 'fr_FR' (equivalent to 'fr_FR.ISO8859-1', not 'fr_FR.UTF-8').
>>> import locale, string
>>> locale.setlocale(locale.LC_ALL, 'es_ES')
'es_ES'
>>> string.letters
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz\xaa\xb5\xba\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
>>> locale.setlocale(locale.LC_ALL, 'es_ES.UTF-8')
'es_ES.UTF-8'
>>> string.letters
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'
So \xaa is character "ª", \xab "«", \xd1 is "Ñ" and so on. But the encoding representation is indeed broken.
I do highly recommend reading this: https://pythonhosted.org/kitchen/unicode-frustrations.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With