I am dealing with unicode strings returned by the python-lastfm library.
I assume somewhere on the way, the library gets the encoding wrong and returns a unicode string that may contain invalid characters.
For example, the original string i am expecting in the variable a is "Glück"
>>> a u'Gl\xfcck' >>> print a Traceback (most recent call last): File "", line 1, in UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 2: ordinal not in range(128)
\xfc is the escaped value 252, which corresponds to the latin1 encoding of "ü". Somehow this gets embedded in the unicode string in a way python can't handle on its own.
How do i convert this back a normal or unicode string that contains the original "Glück"? I tried playing around with the decode/encode methods, but either got a UnicodeEncodeError, or a string containing the sequence \xfc.
The key to troubleshooting Unicode errors in Python is to know what types you have. Then, try these steps: If some variables are byte sequences instead of Unicode objects, convert them to Unicode objects with decode() / u” before handling them.
In python, to remove Unicode character from string python we need to encode the string by using str. encode() for removing the Unicode characters from the string.
Python's string type uses the Unicode Standard for representing characters, which lets Python programs work with all these different possible characters. Unicode (https://www.unicode.org/) is a specification that aims to list every character used by human languages and give each character its own unique code.
You have to convert your unicode string into a standard string using some encoding e.g. utf-8:
some_unicode_string.encode('utf-8')
Apart from that: this is a dupe of
BeautifulSoup findall with class attribute- unicode encode error
and at least ten other related questions on SO. Research first.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With