Python: Unicode and "\xe2\x80\x99" driving me batty

Question

So I have a .txt file from Google Docs containing some lines from David Foster Wallace's "Oblivion". Using:

with open("oblivion.txt", "r", 0) as bookFile:
    wordList = []
    for line in bookFile:
        wordList.append(line)

and returning & printing the wordList I get:

"surgery on the crow\xe2\x80\x99s feet around her eyes."

(and it truncates a lot of the text). However, if instead of appending the wordList I simply

for line in bookFile:
    print line

everything turns out fine! The same goes for .read()'ing the file - the resulting str doesn't have the crazy byte representation, but then I can't manipulate it the way I want to.

Where do I .encode() or .decode() or what? ~~Using Python 2 because 3 was giving me some I/O buffer error.~~ Thanks.

Rahul · Accepted Answer

Try open with encoding as utf-8:

with open("oblivion.txt", "r", encoding='utf-8') as bookFile:
    wordList = bookFile.readlines()

Python: Unicode and "\xe2\x80\x99" driving me batty

Tags:

python

character-encoding

unicode

Luke McPuke

1 Answers

Rahul

Recent Activity

Donate For Us

Python: Unicode and "\xe2\x80\x99" driving me batty

Tags:

python

character-encoding

unicode

Luke McPuke

1 Answers

Rahul

Related questions

Recent Activity

Donate For Us