I am new to Python, and I have a question about how to use Python to read and write CSV files. My file contains like Germany, French, etc. According to my code, the files can be read correctly in Python, but when I write it into a new CSV file, the unicode becomes some strange characters.
The data is like:
And my code is:
import csv f=open('xxx.csv','rb') reader=csv.reader(f) wt=open('lll.csv','wb') writer=csv.writer(wt,quoting=csv.QUOTE_ALL) wt.close() f.close()
And the result is like:
What should I do to solve the problem?
Simple CSV files do not support Unicode/UTF-8 characters.
Python 2 uses str type to store bytes and unicode type to store unicode code points. All strings by default are str type — which is bytes~ And Default encoding is ASCII. So if an incoming file is Cyrillic characters, Python 2 might fail because ASCII will not be able to handle those Cyrillic Characters.
CSV UTF-8 (comma delimited). It is Unicode Transformation Format 8-bit encoding that supports many special characters, including hieroglyphs and accented characters, and is backward compatible with ASCII.
Another alternative:
Use the code from the unicodecsv package ...
https://pypi.python.org/pypi/unicodecsv/
>>> import unicodecsv as csv >>> from io import BytesIO >>> f = BytesIO() >>> w = csv.writer(f, encoding='utf-8') >>> _ = w.writerow((u'é', u'ñ')) >>> _ = f.seek(0) >>> r = csv.reader(f, encoding='utf-8') >>> next(r) == [u'é', u'ñ'] True
This module is API compatible with the STDLIB csv module.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With