I can't create an utf-8 csv file in Python. I'm trying to read it's docs, and in the examples section, it says: <blockquote> For all other encodings the following UnicodeReader and UnicodeWriter classes can be used. They take an additional encoding parameter in their constructor and make sure that the data passes the real reader or writer encoded as UTF-8: </blockquote> Ok. So I have this code: <pre class="prettyprint"><code>values = (unicode("Ñ", "utf-8"), unicode("é", "utf-8")) f = codecs.open('eggs.csv', 'w', encoding="utf-8") writer = UnicodeWriter(f) writer.writerow(values) </code></pre> And I keep getting this error: <pre class="prettyprint"><code>line 159, in writerow self.stream.write(data) File "/usr/lib/python2.6/codecs.py", line 686, in write return self.writer.write(data) File "/usr/lib/python2.6/codecs.py", line 351, in write data, consumed = self.encode(object, self.errors) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 22: ordinal not in range(128) </code></pre> Can someone please give me a light so I can understand what the hell am I doing wrong since I set all the encoding everywhere before calling UnicodeWriter class? <pre class="prettyprint"><code>class UnicodeWriter: """ A CSV writer which will write rows to CSV file "f", which is encoded in the given encoding. """ def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds): # Redirect output to a queue self.queue = cStringIO.StringIO() self.writer = csv.writer(self.queue, dialect=dialect, **kwds) self.stream = f self.encoder = codecs.getincrementalencoder(encoding)() def writerow(self, row): self.writer.writerow([s.encode("utf-8") for s in row]) # Fetch UTF-8 output from the queue ... data = self.queue.getvalue() data = data.decode("utf-8") # ... and reencode it into the target encoding data = self.encoder.encode(data) # write to the target stream self.stream.write(data) # empty queue self.queue.truncate(0) def writerows(self, rows): for row in rows: self.writerow(row) </code></pre>

You don't have to use <code>codecs.open</code>; <code>UnicodeWriter</code> takes Unicode input and takes care of encoding everything into UTF-8. When <code>UnicodeWriter</code> writes into the file handle you passed to it, everything is already in UTF-8 encoding (therefore it works with a normal file you opened with <code>open</code>). By using <code>codecs.open</code>, you essentially convert your Unicode objects to UTF-8 strings in <code>UnicodeWriter</code>, then try to re-encode these strings into UTF-8 again as if these strings contained Unicode strings, which obviously fails.

As you have figured out it works if you use plain open. The reason for this is that you tried to encode UTF-8 twice. Once in <pre class="prettyprint"><code>f = codecs.open('eggs.csv', 'w', encoding="utf-8") </code></pre> and then later in UnicodeWriter.writeRow <pre class="prettyprint"><code># ... and reencode it into the target encoding data = self.encoder.encode(data) </code></pre> To check that this works use your original code and outcomment that line. Greetz

Create an utf-8 csv file in Python

Tags:

python

csv

encoding

utf-8

I can't create an utf-8 csv file in Python.

I'm trying to read it's docs, and in the examples section, it says:

For all other encodings the following UnicodeReader and UnicodeWriter classes can be used. They take an additional encoding parameter in their constructor and make sure that the data passes the real reader or writer encoded as UTF-8:

Ok. So I have this code:

values = (unicode("Ñ", "utf-8"), unicode("é", "utf-8"))
f = codecs.open('eggs.csv', 'w', encoding="utf-8")
writer = UnicodeWriter(f)
writer.writerow(values)

And I keep getting this error:

line 159, in writerow
    self.stream.write(data)
  File "/usr/lib/python2.6/codecs.py", line 686, in write
    return self.writer.write(data)
  File "/usr/lib/python2.6/codecs.py", line 351, in write
    data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 22: ordinal not in range(128)

Can someone please give me a light so I can understand what the hell am I doing wrong since I set all the encoding everywhere before calling UnicodeWriter class?

class UnicodeWriter:
    """
    A CSV writer which will write rows to CSV file "f",
    which is encoded in the given encoding.
    """

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
        # Redirect output to a queue
        self.queue = cStringIO.StringIO()
        self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
        self.stream = f
        self.encoder = codecs.getincrementalencoder(encoding)()

    def writerow(self, row):
        self.writer.writerow([s.encode("utf-8") for s in row])
        # Fetch UTF-8 output from the queue ...
        data = self.queue.getvalue()
        data = data.decode("utf-8")
        # ... and reencode it into the target encoding
        data = self.encoder.encode(data)
        # write to the target stream
        self.stream.write(data)
        # empty queue
        self.queue.truncate(0)

    def writerows(self, rows):
        for row in rows:
            self.writerow(row)

233

asked Jun 21 '10 13:06

Somebody still uses you MS-DOS

2 Answers

You don't have to use codecs.open; UnicodeWriter takes Unicode input and takes care of encoding everything into UTF-8. When UnicodeWriter writes into the file handle you passed to it, everything is already in UTF-8 encoding (therefore it works with a normal file you opened with open).

By using codecs.open, you essentially convert your Unicode objects to UTF-8 strings in UnicodeWriter, then try to re-encode these strings into UTF-8 again as if these strings contained Unicode strings, which obviously fails.

answered Oct 06 '22 00:10

Tamás

As you have figured out it works if you use plain open.

The reason for this is that you tried to encode UTF-8 twice. Once in

f = codecs.open('eggs.csv', 'w', encoding="utf-8")

and then later in UnicodeWriter.writeRow

# ... and reencode it into the target encoding
data = self.encoder.encode(data)

To check that this works use your original code and outcomment that line.

Greetz

answered Oct 05 '22 23:10

KarlsFriend

Related questions
                            
                                ValueError: Series lengths must match to compare when matching dates in Pandas
                            
                                Python requests - threads/processes vs. IO
                            
                                Insert the folium maps into the jinja template
                            
                                How to plot pie charts as subplots with custom size with Plotly in Python
                            
                                How to index a list with a TensorFlow tensor?
                            
                                Increase Version number if Travis at github was successful
                            
                                What is Python's sequence protocol?
                            
                                Nested data in Parquet with Python
                            
                                "OSError: [Errno 22] Invalid argument" when read()ing a huge file
                            
                                Share a dictionary of pandas dataframe across multiprocessing python
                            
                                Double requirement given when trying to use pip install pandas
                            
                                Why are attributes lost after copying a Pandas DataFrame
                            
                                get the lists of functions used/called within a function in python
                            
                                How to write Parquet metadata with pyarrow?
                            
                                Finding all combinations based on multiple conditions for a large list
                            
                                What is the difference between a .py file and .ipynb file?
                            
                                Making python generator via c++20 coroutines
                            
                                Class attribute evaluation and generators
                            
                                Django CMS malfunction: Site matching query does not exist
                            
                                How does the right-shift operator work in a python print statement?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With