Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UnicodeDecodeError on python3 [duplicate]

Im currently trying to use some simple regex on a very big .txt file (couple of million lines of text). The most simple code that causes the problem:

file = open("exampleFileName", "r")  
    for line in file:  
        pass

The error message:

Traceback (most recent call last):
  File "example.py", line 34, in <module>
    example()
  File "example.py", line 16, in example
    for line in file:
  File "/usr/lib/python3.4/codecs.py", line 319, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 7332: invalid continuation byte

How can i fix this? is utf-8 the wrong encoding? And if it is, how do i know which one is right?

Thanks and best regards!

like image 343
EliteKaffee Avatar asked Aug 17 '16 16:08

EliteKaffee


People also ask

How do I resolve UnicodeDecodeError in Python?

The Python "UnicodeDecodeError: 'ascii' codec can't decode byte in position" occurs when we use the ascii codec to decode bytes that were encoded using a different codec. To solve the error, specify the correct encoding, e.g. utf-8 .

What is an invalid start byte?

The Python "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte" occurs when we specify an incorrect encoding when decoding a bytes object. To solve the error, specify the correct encoding, e.g. utf-16 or open the file in binary mode ( rb or wb ).


1 Answers

It looks like it is invalid UTF-8 and you should try to read with latin-1 encoding. Try

file = open('exampleFileName', 'r', encoding='latin-1') 
like image 140
mic4ael Avatar answered Sep 19 '22 12:09

mic4ael