Why is the below item failing? Why does it succeed with "latin-1" codec?
o = "a test of \xe9 char" #I want this to remain a string as this is what I am receiving v = o.decode("utf-8")
Which results in:
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 10: invalid continuation byte
The Python "UnicodeDecodeError: 'utf-8' codec can't decode byte in position: invalid continuation byte" occurs when we specify an incorrect encoding when decoding a bytes object. To solve the error, specify the correct encoding, e.g. latin-1 . Here is an example of how the error occurs.
The Python "UnicodeDecodeError: 'ascii' codec can't decode byte in position" occurs when we use the ascii codec to decode bytes that were encoded using a different codec. To solve the error, specify the correct encoding, e.g. utf-8 .
I had the same error when I tried to open a CSV file by pandas.read_csv
method.
The solution was change the encoding to latin-1
:
pd.read_csv('ml-100k/u.item', sep='|', names=m_cols , encoding='latin-1')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With