Im currently trying to use some simple regex on a very big .txt file (couple of million lines of text). The most simple code that causes the problem: <pre class="prettyprint"><code>file = open("exampleFileName", "r") for line in file: pass </code></pre> The error message: <pre class="prettyprint"><code>Traceback (most recent call last): File "example.py", line 34, in <module> example() File "example.py", line 16, in example for line in file: File "/usr/lib/python3.4/codecs.py", line 319, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 7332: invalid continuation byte </code></pre> How can i fix this? is utf-8 the wrong encoding? And if it is, how do i know which one is right? Thanks and best regards!

It looks like it is invalid UTF-8 and you should try to read with <code>latin-1</code> encoding. Try <pre class="prettyprint"><code>file = open('exampleFileName', 'r', encoding='latin-1') </code></pre>

UnicodeDecodeError on python3 [duplicate]

Tags:

python

regex

utf-8

decoding

Im currently trying to use some simple regex on a very big .txt file (couple of million lines of text). The most simple code that causes the problem:

file = open("exampleFileName", "r")  
    for line in file:  
        pass

The error message:

Traceback (most recent call last):
  File "example.py", line 34, in <module>
    example()
  File "example.py", line 16, in example
    for line in file:
  File "/usr/lib/python3.4/codecs.py", line 319, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 7332: invalid continuation byte

How can i fix this? is utf-8 the wrong encoding? And if it is, how do i know which one is right?

Thanks and best regards!

343

asked Aug 17 '16 16:08

EliteKaffee

1 Answers

It looks like it is invalid UTF-8 and you should try to read with latin-1 encoding. Try

file = open('exampleFileName', 'r', encoding='latin-1')

140

answered Sep 19 '22 12:09

mic4ael

Related questions
                            
                                How read Common Data Format (CDF) in Python
                            
                                Import pandas on docker with tensorflow
                            
                                How can one configure flask to be accessible via public IP interface? [duplicate]
                            
                                How can I conditionally update multiple columns in a panda dataframe
                            
                                how to get the shifted index value of a dataframe in Pandas?
                            
                                How to set the build description via Jenkins REST API or Python?
                            
                                How does the indexing of subplots work
                            
                                python flask can't find '__main__' module in ''
                            
                                Python at Synology, how to get Python3 modules installed and where is Python2.7 installed?
                            
                                how to convert column names into column values in pandas - python
                            
                                Splitting a string in pandas and join it to the old data
                            
                                Pandas, conditional column assignment based on column values
                            
                                Pandas: drop rows based on duplicated values in a list
                            
                                Add UUID's to pandas DF
                            
                                Why is matplotlib's notched boxplot folding back on itself?
                            
                                Error when creating executable file with pyinstaller
                            
                                assertRaises for method with optional parameters
                            
                                Using replace() method in python by index [duplicate]
                            
                                Django Channels
                            
                                How to create a new color image with python Imaging?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With