What's a good way to replace international characters with their base Latin counterparts using Python?

Say I have the string "blöt träbåt" which has a few a and o with umlaut and ring above. I want it to become "blot trabat" as simply as possibly. I've done some digging and found the following method:

import unicodedata
unicode_string = unicodedata.normalize('NFKD', unicode(string))

This will give me the string in unicode format with the international characters split into base letter and combining character (\u0308 for umlauts.) Now to get this back to an ASCII string I could do ascii_string = unicode_string.encode('ASCII', 'ignore') and it'll just ignore the combining characters, resulting in the string "blot trabat".

The question here is: is there a better way to do this? It feels like a roundabout way, and I was thinking there might be something I don't know about. I could of course wrap it up in a helper function, but I'd rather check if this doesn't exist in Python already.

How do you replace special characters in Python?

Using 'str.replace() , we can replace a specific character. If we want to remove that specific character, replace that character with an empty string. The str. replace() method will replace all occurrences of the specific character mentioned.

How do you replace letters with letters in Python?

replace() method helps to replace the occurrence of the given old character with the new character or substring. The method contains the parameters like old(a character that you wish to replace), new(a new character you would like to replace with), and count(a number of times you want to replace the character).

Which Python function is used to replace a character in a string with another character?

Python String replace() The replace() method replaces each matching occurrence of the old character/text in the string with the new character/text.

How do you use non ascii characters in Python?

In order to use non-ASCII characters, Python requires explicit encoding and decoding of strings into Unicode. In IBM® SPSS® Modeler, Python scripts are assumed to be encoded in UTF-8, which is a standard Unicode encoding that supports non-ASCII characters.

It would be better if you created an explicit table, and then used the unicode.translate method. The advantage would be that transliteration is more precise, e.g. transliterating "ö" to "oe" and "ß" to "ss", as should be done in German.

There are several transliteration packages on PyPI: translitcodec, Unidecode, and trans.

What's a good way to replace international characters with their base Latin counterparts using Python?

Tags:

python

string

internationalization

Blixt

People also ask

1 Answers

Martin v. Löwis

Recent Activity

Donate For Us

What's a good way to replace international characters with their base Latin counterparts using Python?

Tags:

python

string

internationalization

Blixt

People also ask

1 Answers

Martin v. Löwis

Related questions

Recent Activity

Donate For Us