I'm seeking simple Python function that takes a string and returns a similar one but with all non-ascii characters converted to their closest ascii equivalent. For example, diacritics and whatnot should be dropped. I'm imagining there must be a pretty canonical way to do this and there are plenty of related stackoverflow questions but I'm not finding a simple answer so it seemed worth a separate question.
Example input/output:
"Étienne" -> "Etienne"
Reading this question made me go looking for something better.
https://pypi.python.org/pypi/Unidecode/0.04.1
Does exactly what you ask for.
In Python 3 and using the regex implementation at PyPI:
http://pypi.python.org/pypi/regex
Starting with the string:
>>> s = "Étienne"
Normalise to NFKD and then remove the diacritics:
>>> import unicodedata
>>> import regex
>>> regex.sub(r"\p{Mn}", "", unicodedata.normalize("NFKD", s))
'Etienne'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With