First, you need to decode the file contents, not encode them. Second, the csv module doesn't like unicode strings in Python 2.7, so having decoded your data you need to convert back to utf-8. Finally, csv. reader is passed an iteration over the lines of the file, not a big string with linebreaks in it.
When opening a file for reading, Python needs to know exactly how the file should be opened with the system. Two access modes are available - reading, and reading in binary mode. The respective flags used are r , and rb , and have to be specified when opening a file with the built-in open() method.
I can't find a duplicate of this for Python 3, which handles encodings differently from Python 2. So here's the answer: instead of opening the file with the default encoding (which is 'utf-8'
), use 'utf-8-sig'
, which expects and strips off the UTF-8 Byte Order Mark, which is what shows up as 
.
That is, instead of
data = open('info.txt')
Do
data = open('info.txt', encoding='utf-8-sig')
Note that if you're on Python 2, you should see e.g. Python, Encoding output to UTF-8 and Convert UTF-8 with BOM to UTF-8 with no BOM in Python. You'll need to do some shenanigans with codecs
or with str.decode
for this to work right in Python 2. But in Python 3, all you need to do is set the encoding=
parameter when you open the file.
I had a very similar problem when dealing with excel csv files. Initially I had saved my file from the drop down choices as a .csv utf-8(comma delimited) file. Then I saved it as just a .csv(comma delimited) file and all was well. Perhaps there might be something similar issue with a .txt file
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With