UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3'

Tags:

I have an Excel spreadsheet that I'm reading in that contains some £ signs.

When I try to read it in using the xlrd module, I get the following error:

x = table.cell_value(row, col) x = x.decode("ISO-8859-1") UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position 0: ordinal not in range(128)

If I rewrite this to x.encode('utf-8') it stops throwing an error, but unfortunately when I then write the data out somewhere else (as latin-1), the £ signs have all become garbled.

How can I fix this, and read the £ signs in correctly?

--- UPDATE ---

Some kind readers have suggested that I don't need to decode it at all, or that I can just encode it to Latin-1 when I need to. The problem with this is that I need to write the data to a CSV file eventually, and it seems to object to the raw strings.

If I don't encode or decode the data at all, then this happens (after I've added the string to an array called items):

for item in items:     #item = [x.encode('latin-1') for x in item]     cleancsv.writerow(item) File "clean_up_barnet.py", line 104, in <module>  cleancsv.writerow(item) UnicodeEncodeError: 'ascii' codec can't encode character u'\u2022' in position 43: ordinal not in range(128)

I get the same error even if I uncomment the Latin-1 line.

632

asked Aug 27 '10 20:08

AP257

2 Answers

A very easy way around all the "'ascii' codec can't encode character…" issues with csvwriter is to instead use unicodecsv, a drop-in replacement for csvwriter.

Install unicodecsv with pip and then you can use it in the exact same way, eg:

import unicodecsv file = open('users.csv', 'w') w = unicodecsv.writer(file) for user in User.objects.all().values_list('first_name', 'last_name', 'email', 'last_login'):     w.writerow(user)

187

answered Sep 29 '22 12:09

jturnbull

For what it's worth: I'm the author of xlrd.

Does xlrd produce unicode?
Option 1: Read the Unicode section at the bottom of the first screenful of xlrd doc: This module presents all text strings as Python unicode objects.
Option 2: print type(text), repr(text)

You say """If I rewrite this to x.encode('utf-8') it stops throwing an error, but unfortunately when I then write the data out somewhere else (as latin-1), the £ signs have all become garbled.""" Of course if you write UTF-8-encoded text to a device that's expecting latin1, it will be garbled. What do did you expect?

You say in your edit: """I get the same error even if I uncomment the Latin-1 line""". This is very unlikely -- much more likely is that you got a slightly different error (mentioning the latin1 codec instead of the ascii codec) in a different source line (the uncommented latin1 line instead of the writerow line). Reading error messages carefully aids understanding.

Your problem here is that in general your data is NOT encodable in latin1; very little real-world data is. Your POUND SIGN is encodable in latin1, but that's not all your non-ASCII data. The problematic character is U+2022 BULLET which is not encodable in latin1.

It would have helped you get a better answer sooner if you had mentioned up front that you were working on Mac OS X ... the usual suspect for a CSV-suitable encoding is cp1252 (Windows), not mac-roman.

answered Sep 29 '22 13:09

John Machin

Related questions
                            
                                How to embed Ruby in JavaScript (Rails + .html.erb file)
                            
                                Selecting first td in the first table with CSS 3
                            
                                Combine picture and plot with Python Matplotlib
                            
                                Emacs technique for comparing sections of documents?
                            
                                How do I retrieve the logged in Google account on android phones?
                            
                                Syntax error near unexpected token `elif'
                            
                                How can I hide the Adobe Reader toolbar when displaying a PDF in the .NET WebBrowser control?
                            
                                How do I check if a SQL Server datetime column is empty?
                            
                                Using Qt in Java?
                            
                                Sending multiple parameters to Actions in ASP.NET MVC
                            
                                Comparing Performance of int and Integer
                            
                                Bug with Chrome's localStorage implementation?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With