Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3'

Tags:

I have an Excel spreadsheet that I'm reading in that contains some £ signs.

When I try to read it in using the xlrd module, I get the following error:

x = table.cell_value(row, col) x = x.decode("ISO-8859-1") UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position 0: ordinal not in range(128) 

If I rewrite this to x.encode('utf-8') it stops throwing an error, but unfortunately when I then write the data out somewhere else (as latin-1), the £ signs have all become garbled.

How can I fix this, and read the £ signs in correctly?

--- UPDATE ---

Some kind readers have suggested that I don't need to decode it at all, or that I can just encode it to Latin-1 when I need to. The problem with this is that I need to write the data to a CSV file eventually, and it seems to object to the raw strings.

If I don't encode or decode the data at all, then this happens (after I've added the string to an array called items):

for item in items:     #item = [x.encode('latin-1') for x in item]     cleancsv.writerow(item) File "clean_up_barnet.py", line 104, in <module>  cleancsv.writerow(item) UnicodeEncodeError: 'ascii' codec can't encode character u'\u2022' in position 43: ordinal not in range(128) 

I get the same error even if I uncomment the Latin-1 line.

like image 632
AP257 Avatar asked Aug 27 '10 20:08

AP257


People also ask

How do I fix UnicodeEncodeError in Python?

Encoding Strings In order to get rid of the error, you should explicitly specify the desired encoding. This can be achieved with the use of encode() method, as demonstrated below. In most of the cases, utf-8 encoding will do the trick.

What is UnicodeEncodeError?

The UnicodeEncodeError normally happens when encoding a unicode string into a certain coding. Since codings map only a limited number of unicode characters to str strings, a non-presented character will cause the coding-specific encode() to fail. Encoding from unicode to str. >>>


2 Answers

A very easy way around all the "'ascii' codec can't encode character…" issues with csvwriter is to instead use unicodecsv, a drop-in replacement for csvwriter.

Install unicodecsv with pip and then you can use it in the exact same way, eg:

import unicodecsv file = open('users.csv', 'w') w = unicodecsv.writer(file) for user in User.objects.all().values_list('first_name', 'last_name', 'email', 'last_login'):     w.writerow(user) 
like image 187
jturnbull Avatar answered Sep 29 '22 12:09

jturnbull


For what it's worth: I'm the author of xlrd.

Does xlrd produce unicode?
Option 1: Read the Unicode section at the bottom of the first screenful of xlrd doc: This module presents all text strings as Python unicode objects.
Option 2: print type(text), repr(text)

You say """If I rewrite this to x.encode('utf-8') it stops throwing an error, but unfortunately when I then write the data out somewhere else (as latin-1), the £ signs have all become garbled.""" Of course if you write UTF-8-encoded text to a device that's expecting latin1, it will be garbled. What do did you expect?

You say in your edit: """I get the same error even if I uncomment the Latin-1 line""". This is very unlikely -- much more likely is that you got a slightly different error (mentioning the latin1 codec instead of the ascii codec) in a different source line (the uncommented latin1 line instead of the writerow line). Reading error messages carefully aids understanding.

Your problem here is that in general your data is NOT encodable in latin1; very little real-world data is. Your POUND SIGN is encodable in latin1, but that's not all your non-ASCII data. The problematic character is U+2022 BULLET which is not encodable in latin1.

It would have helped you get a better answer sooner if you had mentioned up front that you were working on Mac OS X ... the usual suspect for a CSV-suitable encoding is cp1252 (Windows), not mac-roman.

like image 32
John Machin Avatar answered Sep 29 '22 13:09

John Machin