Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UnicodeDecodeError: 'ascii' codec can't decode byte 0xa3

I got this string 'Velcro Back Rest \xa36.99'. Note it does not have u in the front. Its just plain ascii.

How do I convert it to unicode?

I tried this,

>>> unicode('Velcro Back Rest \xa36.99')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa3 in position 17: ordinal not in range(128)

This answer explain it nicely. But I have same question as the OP of that question. In the answer to that comment Winston says "You should not encoding a string object ..."

But the framework I am working requires that it should be converted unicode string. I use scrapy and I have this line.

loader.add_value('name', product_name)

Here product_name contains that problematic string and it throws the error.

like image 318
Genghis Khan Avatar asked Jun 20 '13 17:06

Genghis Khan


1 Answers

You need to specify an encoding to decode the bytes to Unicode with:

>>> 'Velcro Back Rest \xa36.99'.decode('latin1')
u'Velcro Back Rest \xa36.99'
>>> print 'Velcro Back Rest \xa36.99'.decode('latin1')
Velcro Back Rest £6.99

In this case, I was able to guess the encoding from experience, you need to provide the correct codec used for each encoding you encounter. For web data, that is usually included in the from of the content-type header:

Content-Type: text/html; charset=iso-8859-1

where iso-8859-1 is the official standard name for the Latin 1 encoding, for example. Python recognizes latin1 as an alias for iso-8859-1.

Note that your input data is not plain ASCII. If it was, it'd only use bytes in the range 0 through to 127; \xa3 is 163 decimal, so outside of the ASCII range.

like image 145
Martijn Pieters Avatar answered Oct 16 '22 09:10

Martijn Pieters