I've been reading all questions regarding conversion from Unicode to CSV in Python here in StackOverflow and I'm still lost. Everytime I receive a "UnicodeEncodeError: 'ascii' codec can't encode character u'\xd1' in position 12: ordinal not in range(128)" <pre class="prettyprint"><code>buffer=cStringIO.StringIO() writer=csv.writer(buffer, csv.excel) cr.execute(query, query_param) while (1): row = cr.fetchone() writer.writerow([s.encode('ascii','ignore') for s in row]) </code></pre> The value of row is <pre class="prettyprint"><code>(56, u"LIMPIADOR BA\xd1O 1'5 L") </code></pre> where the value of \xd10 at the database is ñ, a n with a diacritical tilde used in Spanish. At first I tried to convert the value to something valid in ascii, but after losing so much time I'm trying only to ignore those characters (I suppose I'd have the same problem with accented vowels). I'd like to save the value to the CSV, preferably with the ñ ("LIMPIADOR BAÑO 1'5 L"), but if not possible, at least be able to save it ("LIMPIADOR BAO 1'5 L").

Correct, ñ is not a valid ASCII character, so you can't encode it to ASCII. So you can, as your code does above, ignore them. Another way, namely to remove the accents, you can find here: What is the best way to remove accents in a Python unicode string? But note that both techniques can result in bad effects, like making words actually mean something different, etc. So the best is to keep the accents. And then you can't use ASCII, but you can use another encoding. UTF-8 is the safe bet. Latin-1 or ISO-88591-1 is common one, but it includes only Western European characters. CP-1252 is common on Windows, etc, etc. So just switch "ascii" for whatever encoding you want. <hr> Your actual code, according to your comment is: <pre class="prettyprint"><code>writer.writerow([s.encode('utf8') if type(s) is unicode else s for s in row]) </code></pre> where <pre class="prettyprint"><code>row = (56, u"LIMPIADOR BA\xd1O 1'5 L") </code></pre> Now, I believe that should work, but apparently it doesn't. I think unicode gets passed into the cvs writer by mistake anyway. Unwrap that long line to it's parts: <pre class="prettyprint"><code>col1, col2 = row # Use the names of what is actually there instead row = col1, col2.encode('utf8') writer.writerow(row) </code></pre> Now your real error will not be hidden by the fact that you stick everything in the same line. This could also probably have been avoided if you had included a proper traceback.

Python: Convert Unicode to ASCII without errors for CSV file

Tags:

python

csv

unicode

ascii

diacritics

I've been reading all questions regarding conversion from Unicode to CSV in Python here in StackOverflow and I'm still lost. Everytime I receive a "UnicodeEncodeError: 'ascii' codec can't encode character u'\xd1' in position 12: ordinal not in range(128)"

buffer=cStringIO.StringIO()
writer=csv.writer(buffer, csv.excel)
cr.execute(query, query_param)
while (1):
    row = cr.fetchone()
    writer.writerow([s.encode('ascii','ignore') for s in row])

The value of row is

(56, u"LIMPIADOR BA\xd1O 1'5 L")

where the value of \xd10 at the database is ñ, a n with a diacritical tilde used in Spanish. At first I tried to convert the value to something valid in ascii, but after losing so much time I'm trying only to ignore those characters (I suppose I'd have the same problem with accented vowels).

I'd like to save the value to the CSV, preferably with the ñ ("LIMPIADOR BAÑO 1'5 L"), but if not possible, at least be able to save it ("LIMPIADOR BAO 1'5 L").

938

asked Jan 10 '11 19:01

Sergi

1 Answers

Correct, ñ is not a valid ASCII character, so you can't encode it to ASCII. So you can, as your code does above, ignore them. Another way, namely to remove the accents, you can find here: What is the best way to remove accents in a Python unicode string?

But note that both techniques can result in bad effects, like making words actually mean something different, etc. So the best is to keep the accents. And then you can't use ASCII, but you can use another encoding. UTF-8 is the safe bet. Latin-1 or ISO-88591-1 is common one, but it includes only Western European characters. CP-1252 is common on Windows, etc, etc.

So just switch "ascii" for whatever encoding you want.

Your actual code, according to your comment is:

writer.writerow([s.encode('utf8') if type(s) is unicode else s for s in row])

where

row = (56, u"LIMPIADOR BA\xd1O 1'5 L")

Now, I believe that should work, but apparently it doesn't. I think unicode gets passed into the cvs writer by mistake anyway. Unwrap that long line to it's parts:

col1, col2 = row # Use the names of what is actually there instead
row = col1, col2.encode('utf8')
writer.writerow(row)

Now your real error will not be hidden by the fact that you stick everything in the same line. This could also probably have been avoided if you had included a proper traceback.

163

answered Nov 14 '22 22:11

Lennart Regebro

Related questions
                            
                                How do you define config variables / constants in Google App Engine (Python)?
                            
                                Merits of Bash Script v. Python Script for Shell-Command-Heavy Utility [closed]
                            
                                Importing SPSS dataset into Python
                            
                                What does this python syntax mean?
                            
                                specify dtype of each object in a python numpy array
                            
                                How to get random slice of python list of constant size. (smallest code)
                            
                                Is Celery appropriate for use with many small, distributed systems?
                            
                                Convert Python string to its ASCII representants
                            
                                How to adapt my current splash screen to allow other pieces of my code to run in the background?
                            
                                Taking list's tail in a Pythonic way?
                            
                                python subprocess with shell=True: redirections and platform-independent subprocess killing
                            
                                How to do a multi-level CLI in Python?
                            
                                Creating aliases for Python packages?
                            
                                is there a way to use input("Press any key to continue") on version 2.6
                            
                                python imap: how to parse multipart mail content
                            
                                python psycogp2 inserting into postgresql help
                            
                                Using POST and urllib2 to access web API
                            
                                Is there a better way to do this python code?
                            
                                Recreating time series data using FFT results without using ifft
                            
                                How do I add an extra attribute in my input for Django forms?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With