I got this string 'Velcro Back Rest \xa36.99'
. Note it does not have u
in the front. Its just plain ascii.
How do I convert it to unicode?
I tried this,
>>> unicode('Velcro Back Rest \xa36.99')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa3 in position 17: ordinal not in range(128)
This answer explain it nicely. But I have same question as the OP of that question. In the answer to that comment Winston says "You should not encoding a string object ..."
But the framework I am working requires that it should be converted unicode string. I use scrapy and I have this line.
loader.add_value('name', product_name)
Here product_name
contains that problematic string and it throws the error.
You need to specify an encoding to decode the bytes to Unicode with:
>>> 'Velcro Back Rest \xa36.99'.decode('latin1')
u'Velcro Back Rest \xa36.99'
>>> print 'Velcro Back Rest \xa36.99'.decode('latin1')
Velcro Back Rest £6.99
In this case, I was able to guess the encoding from experience, you need to provide the correct codec used for each encoding you encounter. For web data, that is usually included in the from of the content-type header:
Content-Type: text/html; charset=iso-8859-1
where iso-8859-1
is the official standard name for the Latin 1 encoding, for example. Python recognizes latin1
as an alias for iso-8859-1
.
Note that your input data is not plain ASCII. If it was, it'd only use bytes in the range 0 through to 127; \xa3
is 163 decimal, so outside of the ASCII range.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With