"for line in..." results in UnicodeDecodeError: 'utf-8' codec can't decode byte

People also ask

What is UTF-8 codec can't decode byte?

The Python "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte" occurs when we specify an incorrect encoding when decoding a bytes object. To solve the error, specify the correct encoding, e.g. utf-16 or open the file in binary mode ( rb or wb ).

What does UnicodeDecodeError mean in Python?

The Python "UnicodeDecodeError: 'ascii' codec can't decode byte in position" occurs when we use the ascii codec to decode bytes that were encoded using a different codec. To solve the error, specify the correct encoding, e.g. utf-8 .

Is a UTF-8 character?

UTF-8 (UCS Transformation Format 8) is the World Wide Web's most common character encoding. Each character is represented by one to four bytes. UTF-8 is backward-compatible with ASCII and can represent any standard Unicode character.

As suggested by Mark Ransom, I found the right encoding for that problem. The encoding was "ISO-8859-1", so replacing open("u.item", encoding="utf-8") with open('u.item', encoding = "ISO-8859-1") will solve the problem.

The following also worked for me. ISO 8859-1 is going to save a lot, mainly if using Speech Recognition APIs.

Example:

Click to copy

file = open('../Resources/' + filename, 'r', encoding="ISO-8859-1")

Your file doesn't actually contain UTF-8 encoded data; it contains some other encoding. Figure out what that encoding is and use it in the open call.

In Windows-1252 encoding, for example, the 0xe9 would be the character é.

Try this to read using Pandas:

Click to copy

pd.read_csv('u.item', sep='|', names=m_cols, encoding='latin-1')

This works:

Click to copy

open('filename', encoding='latin-1')

Or:

Click to copy

open('filename', encoding="ISO-8859-1")

If you are using Python 2, the following will be the solution:

Click to copy

import io
for line in io.open("u.item", encoding="ISO-8859-1"):
    # Do something

Because the encoding parameter doesn't work with open(), you will be getting the following error:

TypeError: 'encoding' is an invalid keyword argument for this function

Related questions
                            
                                How do I get Flask to run on port 80?
                            
                                Sibling package imports
                            
                                What is the correct syntax for 'else if'?
                            
                                How to get string objects instead of Unicode from JSON?
                            
                                How to get method parameter names?
                            
                                How to get a string after a specific substring?
                            
                                Anaconda vs. miniconda
                            
                                How to have one colorbar for all subplots
                            
                                How do I integrate Ajax with Django applications?
                            
                                libxml install error using pip
                            
                                How do I profile memory usage in Python?
                            
                                How to read a single character from the user?
                            
                                How to select rows with one or more nulls from a pandas DataFrame without listing columns explicitly?
                            
                                Programmatically generate video or animated GIF in Python?
                            
                                How to exit from Python without traceback?
                            
                                Simplify Chained Comparison
                            
                                Wrapping a C library in Python: C, Cython or ctypes?
                            
                                Mapping over values in a python dictionary
                            
                                Split (explode) pandas dataframe string entry to separate rows
                            
                                Python - Count elements in list [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

"for line in..." results in UnicodeDecodeError: 'utf-8' codec can't decode byte

Tags:

python

python-3.x

character-encoding

People also ask

Recent Activity

Donate For Us