I'm administering some Python code in which I now see an error in the logs:
Traceback (most recent call last):
File "./app/core.py", line 772, in scrapeEmail
l.info('EMAIL SUBJECT: ', header['value'])
File "./app/__init__.py", line 44, in info
logging.info(str(datetime.utcnow()) + ' INFO ' + caller.filename + ':' + str(caller.lineno) + ' - ' + ' '.join([str(x) for x in args]))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xea' in position 25: ordinal not in range(128)
which I guess means that header['value']
contains differently encoded characters.
I searched around, and this SO answer suggests to "put .encode('utf-8')
at the end of the object for recent versions of Python".
This raised two questions for me:
.encode('utf-8')
. On x
or on str(x)
. So should it be str(x.encode('utf-8'))
or on str(x).encode('utf-8')
?.encode('utf-8')
in Python 2.7
?Normally I would simply try it, but it is not easy (actually impossible) to find the string on which the error occurred. So I can't really test it.
A little help would be greatly appreciated here.
I suggest that you should get clearly known about the relationship between unicode and other coding format(e.g GB2312, GBK) firstly. And soon there is no major problem on encoding and decoding:)
The following diagram will show you the relationship, once you got the main point on it, you will know when and how to do the encode and decode in your code. :)
--------- ----------- ----------
| | 1.decode(A) | | 2.encode(B) | |
| A | -----------> | unicode | ----------->| B |
| | <----------- | | <---------- | |
| | 4.encode(A) | | 3.decode(B) | |
--------- ----------- ----------
So, according to the diagram, you should know what encoding is now, and what encoding want to transform, and then follow the relationship as diagram shows.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With