Trying to create a twitter bot that reads lines and posts them. Using Python3 and tweepy, via a virtualenv on my shared server space. This is the part of the code that seems to have trouble:
#!/foo/env/bin/python3 import re import tweepy, time, sys argfile = str(sys.argv[1]) filename=open(argfile, 'r') f=filename.readlines() filename.close()
this is the error I get:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfe in position 0: ordinal not in range(128)
The error specifically points to f=filename.readlines()
as the source of the error. Any idea what might be wrong? Thanks.
The readlines() method returns a list containing each line in the file as a list item.
The Python "UnicodeDecodeError: 'ascii' codec can't decode byte in position" occurs when we use the ascii codec to decode bytes that were encoded using a different codec. To solve the error, specify the correct encoding, e.g. utf-8 .
I think the best answer (in Python 3) is to use the errors=
parameter:
with open('evil_unicode.txt', 'r', errors='replace') as f: lines = f.readlines()
Proof:
>>> s = b'\xe5abc\nline2\nline3' >>> with open('evil_unicode.txt','wb') as f: ... f.write(s) ... 16 >>> with open('evil_unicode.txt', 'r') as f: ... lines = f.readlines() ... Traceback (most recent call last): File "<stdin>", line 2, in <module> File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/codecs.py", line 319, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe5 in position 0: invalid continuation byte >>> with open('evil_unicode.txt', 'r', errors='replace') as f: ... lines = f.readlines() ... >>> lines ['�abc\n', 'line2\n', 'line3'] >>>
Note that the errors=
can be replace
or ignore
. Here's what ignore
looks like:
>>> with open('evil_unicode.txt', 'r', errors='ignore') as f: ... lines = f.readlines() ... >>> lines ['abc\n', 'line2\n', 'line3']
Your default encoding appears to be ASCII, where the input is more than likely UTF-8. When you hit non-ASCII bytes in the input, it's throwing the exception. It's not so much that readlines
itself is responsible for the problem; rather, it's causing the read+decode to occur, and the decode is failing.
It's an easy fix though; the default open
in Python 3 allows you to provide the known encoding
of an input, replacing the default (ASCII in your case) with any other recognized encoding. Providing it allows you to keep reading as str
(rather than the significantly different raw binary data bytes
objects), while letting Python do the work of converting from raw disk bytes to true text data:
# Using with statement closes the file for us without needing to remember to close # explicitly, and closes even when exceptions occur with open(argfile, encoding='utf-8') as inf: f = inf.readlines()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With