Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use .encode('utf-8') in Python?

I'm administering some Python code in which I now see an error in the logs:

Traceback (most recent call last):
  File "./app/core.py", line 772, in scrapeEmail
    l.info('EMAIL SUBJECT: ', header['value'])
  File "./app/__init__.py", line 44, in info
    logging.info(str(datetime.utcnow()) + ' INFO     ' + caller.filename + ':' + str(caller.lineno) + ' - ' + ' '.join([str(x) for x in args]))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xea' in position 25: ordinal not in range(128)

which I guess means that header['value'] contains differently encoded characters.

I searched around, and this SO answer suggests to "put .encode('utf-8') at the end of the object for recent versions of Python".

This raised two questions for me:

  1. On what object do I need to use .encode('utf-8'). On x or on str(x). So should it be str(x.encode('utf-8')) or on str(x).encode('utf-8')?
  2. What does the writer mean with "recent versions of Python"? Can I still use .encode('utf-8') in Python 2.7?

Normally I would simply try it, but it is not easy (actually impossible) to find the string on which the error occurred. So I can't really test it.

A little help would be greatly appreciated here.

like image 406
kramer65 Avatar asked Mar 13 '23 23:03

kramer65


1 Answers

I suggest that you should get clearly known about the relationship between unicode and other coding format(e.g GB2312, GBK) firstly. And soon there is no major problem on encoding and decoding:)

The following diagram will show you the relationship, once you got the main point on it, you will know when and how to do the encode and decode in your code. :)

---------              -----------             ----------
|       |  1.decode(A) |         | 2.encode(B) |        |
|   A   | -----------> | unicode | ----------->|   B    |
|       | <----------- |         | <---------- |        |
|       |  4.encode(A) |         | 3.decode(B) |        |
---------              -----------             ----------

So, according to the diagram, you should know what encoding is now, and what encoding want to transform, and then follow the relationship as diagram shows.

like image 113
Ryan Chou Avatar answered Mar 17 '23 03:03

Ryan Chou