In Python 2 you could do the following to get the current locale's character set:
import string
print string.letters
However, in Python 3 the string module's locale-dependent constants (e.g. string.letters
, string.lowercase
, string.uppercase
, etc.) were removed.
How can I get the current locale's character set using Python 3?
You can use the regular expression 'r[^a-zA-Z]' to match with non-alphabet characters in the string and replace them with an empty string using the re. sub() function. The resulting string will contain only letters.
The isalpha() function is a built-in function used for string handling in python, which checks if the single input character is an alphabet or if all the characters in the input string are alphabets.
You can get the exemplar characters for each locale using the pyicu module:
import locale
from icu import LocaleData
default, encoding = locale.getdefaultlocale()
languages = [default] + ['en_US', 'fr_FR', 'es_ES']
for language in languages:
data = LocaleData(language)
alphabet = data.getExemplarSet()
print(language, alphabet)
Output
pt_BR [a-zà-ãçéêíò-õú]
en_US [a-z]
fr_FR [a-zàâæ-ëîïôùûüÿœ]
es_ES [a-záéíñóúü]
To get the current locale is enough to do:
default, _ = locale.getdefaultlocale()
data = LocaleData(default)
alphabet = data.getExemplarSet()
print(default, alphabet)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With