I can't create an utf-8 csv file in Python.
I'm trying to read it's docs, and in the examples section, it says:
For all other encodings the following UnicodeReader and UnicodeWriter classes can be used. They take an additional encoding parameter in their constructor and make sure that the data passes the real reader or writer encoded as UTF-8:
Ok. So I have this code:
values = (unicode("Ñ", "utf-8"), unicode("é", "utf-8"))
f = codecs.open('eggs.csv', 'w', encoding="utf-8")
writer = UnicodeWriter(f)
writer.writerow(values)
And I keep getting this error:
line 159, in writerow
self.stream.write(data)
File "/usr/lib/python2.6/codecs.py", line 686, in write
return self.writer.write(data)
File "/usr/lib/python2.6/codecs.py", line 351, in write
data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 22: ordinal not in range(128)
Can someone please give me a light so I can understand what the hell am I doing wrong since I set all the encoding everywhere before calling UnicodeWriter class?
class UnicodeWriter:
"""
A CSV writer which will write rows to CSV file "f",
which is encoded in the given encoding.
"""
def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
# Redirect output to a queue
self.queue = cStringIO.StringIO()
self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
self.stream = f
self.encoder = codecs.getincrementalencoder(encoding)()
def writerow(self, row):
self.writer.writerow([s.encode("utf-8") for s in row])
# Fetch UTF-8 output from the queue ...
data = self.queue.getvalue()
data = data.decode("utf-8")
# ... and reencode it into the target encoding
data = self.encoder.encode(data)
# write to the target stream
self.stream.write(data)
# empty queue
self.queue.truncate(0)
def writerows(self, rows):
for row in rows:
self.writerow(row)
csv) from the Save as type drop-down. Check the Edit filter settings options. Click on Save. Step 3 – From the Export Text File dialog box that appears, select the Unicode (UTF-8) option from the Character set drop-down.
encode('utf-8') filename = 'output. csv' reader = unicode_csv_reader(open(filename)) try: products = [] for field1, field2, field3 in reader: ...
You don't have to use codecs.open
; UnicodeWriter
takes Unicode input and takes care of encoding everything into UTF-8. When UnicodeWriter
writes into the file handle you passed to it, everything is already in UTF-8 encoding (therefore it works with a normal file you opened with open
).
By using codecs.open
, you essentially convert your Unicode objects to UTF-8 strings in UnicodeWriter
, then try to re-encode these strings into UTF-8 again as if these strings contained Unicode strings, which obviously fails.
As you have figured out it works if you use plain open.
The reason for this is that you tried to encode UTF-8 twice. Once in
f = codecs.open('eggs.csv', 'w', encoding="utf-8")
and then later in UnicodeWriter.writeRow
# ... and reencode it into the target encoding
data = self.encoder.encode(data)
To check that this works use your original code and outcomment that line.
Greetz
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With