Python3 UnicodeDecodeError with readlines() method

Tags:

Trying to create a twitter bot that reads lines and posts them. Using Python3 and tweepy, via a virtualenv on my shared server space. This is the part of the code that seems to have trouble:

#!/foo/env/bin/python3  import re import tweepy, time, sys  argfile = str(sys.argv[1])  filename=open(argfile, 'r') f=filename.readlines() filename.close()

this is the error I get:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xfe in position 0: ordinal not in range(128)

The error specifically points to f=filename.readlines() as the source of the error. Any idea what might be wrong? Thanks.

327

asked Jan 27 '16 04:01

2 Answers

I think the best answer (in Python 3) is to use the errors= parameter:

with open('evil_unicode.txt', 'r', errors='replace') as f:     lines = f.readlines()

Proof:

>>> s = b'\xe5abc\nline2\nline3' >>> with open('evil_unicode.txt','wb') as f: ...     f.write(s) ... 16 >>> with open('evil_unicode.txt', 'r') as f: ...     lines = f.readlines() ... Traceback (most recent call last):   File "<stdin>", line 2, in <module>   File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/codecs.py", line 319, in decode     (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe5 in position 0: invalid continuation byte >>> with open('evil_unicode.txt', 'r', errors='replace') as f: ...     lines = f.readlines() ... >>> lines ['�abc\n', 'line2\n', 'line3'] >>>

Note that the errors= can be replace or ignore. Here's what ignore looks like:

>>> with open('evil_unicode.txt', 'r', errors='ignore') as f: ...     lines = f.readlines() ... >>> lines ['abc\n', 'line2\n', 'line3']

145

answered Sep 17 '22 14:09

Your default encoding appears to be ASCII, where the input is more than likely UTF-8. When you hit non-ASCII bytes in the input, it's throwing the exception. It's not so much that readlines itself is responsible for the problem; rather, it's causing the read+decode to occur, and the decode is failing.

It's an easy fix though; the default open in Python 3 allows you to provide the known encoding of an input, replacing the default (ASCII in your case) with any other recognized encoding. Providing it allows you to keep reading as str (rather than the significantly different raw binary data bytes objects), while letting Python do the work of converting from raw disk bytes to true text data:

# Using with statement closes the file for us without needing to remember to close # explicitly, and closes even when exceptions occur with open(argfile, encoding='utf-8') as inf:     f = inf.readlines()

answered Sep 16 '22 14:09

ShadowRanger

Related questions
                            
                                What does this mean exit (main())
                            
                                Python Itertools.Permutations()
                            
                                How to merge two json string in Python?
                            
                                Find how many lines in string
                            
                                Gmail API Error from Code Sample - a bytes-like object is required, not 'str'
                            
                                What's the difference between casting and coercion in Python?
                            
                                Compare string with all values in list
                            
                                ubuntu ImportError: cannot import name MAXREPEAT
                            
                                Why does checking a variable against multiple values with `OR` only check the first value? [duplicate]
                            
                                TypeError with ufunc bitwise_xor
                            
                                make pandas DataFrame to a dict and dropna
                            
                                Return a variable in a Python list with double quotes instead of single
                            
                                Adjust cell width in Excel
                            
                                replace column values in one dataframe by values of another dataframe
                            
                                Heroku not recognized as an internal or external command (Windows)
                            
                                Test Django views that require login using RequestFactory
                            
                                How to Mock an HTTP request in a unit testing scenario in Python
                            
                                How to find a particular JSON value by key?
                            
                                reading excel to a python data frame starting from row 5 and including headers
                            
                                Find last occurrence of character in string Python [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python3 UnicodeDecodeError with readlines() method

Tags:

python

python-3.x

unicode

sys

tweepy

r_e_cur

People also ask

2 Answers

caleb

ShadowRanger

Recent Activity

Donate For Us