Handle wrongly encoded character in Python unicode string

Tags:

I am dealing with unicode strings returned by the python-lastfm library.

I assume somewhere on the way, the library gets the encoding wrong and returns a unicode string that may contain invalid characters.

For example, the original string i am expecting in the variable a is "Glück"

>>> a
u'Gl\xfcck'
>>> print a
Traceback (most recent call last):
  File "", line 1, in 
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 2: ordinal not in range(128)

\xfc is the escaped value 252, which corresponds to the latin1 encoding of "ü". Somehow this gets embedded in the unicode string in a way python can't handle on its own.

How do i convert this back a normal or unicode string that contains the original "Glück"? I tried playing around with the decode/encode methods, but either got a UnicodeEncodeError, or a string containing the sequence \xfc.

500

asked Apr 22 '11 23:04

strfry

1 Answers

You have to convert your unicode string into a standard string using some encoding e.g. utf-8:

some_unicode_string.encode('utf-8')

Apart from that: this is a dupe of

BeautifulSoup findall with class attribute- unicode encode error

and at least ten other related questions on SO. Research first.

155

answered Oct 03 '22 16:10

Andreas Jung

Related questions
                            
                                geodesic distance transform in python
                            
                                How does this Python 3 quine work?
                            
                                Python: List of lists to dictionary [closed]
                            
                                Flask app get "IOError: [Errno 32] Broken pipe"
                            
                                Why are mutable values in Python Enums the same object?
                            
                                Tensorflow Queues - Switching between train and validation data
                            
                                Equivalent of copyTo in Python OpenCV bindings?
                            
                                When to use multiindexing vs. xarray in pandas
                            
                                Pandas: Replacement for .ix
                            
                                Python 3.6 urllib TypeError: can't concat bytes to str
                            
                                Pandas: conditional shift
                            
                                how to merge two dataframes and sum the values of columns
                            
                                Why is the first element in python's sys.path an empty string?
                            
                                Downloading mutliple stocks at once from yahoo finance python
                            
                                Pandas: `item` has been deprecated
                            
                                Saving Keras models with Custom Layers
                            
                                Are Python 2.5 .pyc files compatible with Python 2.6 .pyc files?
                            
                                Parsing a Wikipedia dump
                            
                                memory size of Python data structure
                            
                                Python Regex, re.sub, replacing multiple parts of pattern?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Handle wrongly encoded character in Python unicode string

Tags:

python

string

character-encoding

unicode

strfry

People also ask

1 Answers

Andreas Jung

Recent Activity

Donate For Us