So I have a .txt file from Google Docs containing some lines from David Foster Wallace's "Oblivion". Using:
with open("oblivion.txt", "r", 0) as bookFile:
wordList = []
for line in bookFile:
wordList.append(line)
and returning & printing the wordList I get:
"surgery on the crow\xe2\x80\x99s feet around her eyes."
(and it truncates a lot of the text). However, if instead of appending the wordList I simply
for line in bookFile:
print line
everything turns out fine! The same goes for .read()'ing the file - the resulting str doesn't have the crazy byte representation, but then I can't manipulate it the way I want to.
Where do I .encode() or .decode() or what? Using Python 2 because 3 was giving me some I/O buffer error. Thanks.
Try open
with encoding
as utf-8
:
with open("oblivion.txt", "r", encoding='utf-8') as bookFile:
wordList = bookFile.readlines()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With