The Python "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte" occurs when we specify an incorrect encoding when decoding a bytes object. To solve the error, specify the correct encoding, e.g. utf-16 or open the file in binary mode ( rb or wb ).
The Python "UnicodeDecodeError: 'ascii' codec can't decode byte in position" occurs when we use the ascii codec to decode bytes that were encoded using a different codec. To solve the error, specify the correct encoding, e.g. utf-8 .
UTF-8 (UCS Transformation Format 8) is the World Wide Web's most common character encoding. Each character is represented by one to four bytes. UTF-8 is backward-compatible with ASCII and can represent any standard Unicode character.
As suggested by Mark Ransom, I found the right encoding for that problem. The encoding was "ISO-8859-1"
, so replacing open("u.item", encoding="utf-8")
with open('u.item', encoding = "ISO-8859-1")
will solve the problem.
The following also worked for me. ISO 8859-1 is going to save a lot, mainly if using Speech Recognition APIs.
Example:
file = open('../Resources/' + filename, 'r', encoding="ISO-8859-1")
Your file doesn't actually contain UTF-8 encoded data; it contains some other encoding. Figure out what that encoding is and use it in the open
call.
In Windows-1252 encoding, for example, the 0xe9
would be the character é
.
Try this to read using Pandas:
pd.read_csv('u.item', sep='|', names=m_cols, encoding='latin-1')
This works:
open('filename', encoding='latin-1')
Or:
open('filename', encoding="ISO-8859-1")
If you are using Python 2, the following will be the solution:
import io
for line in io.open("u.item", encoding="ISO-8859-1"):
# Do something
Because the encoding
parameter doesn't work with open()
, you will be getting the following error:
TypeError: 'encoding' is an invalid keyword argument for this function
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With