How to use .encode('utf-8') in Python?

Question

I'm administering some Python code in which I now see an error in the logs:

Traceback (most recent call last):
  File "./app/core.py", line 772, in scrapeEmail
    l.info('EMAIL SUBJECT: ', header['value'])
  File "./app/__init__.py", line 44, in info
    logging.info(str(datetime.utcnow()) + ' INFO     ' + caller.filename + ':' + str(caller.lineno) + ' - ' + ' '.join([str(x) for x in args]))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xea' in position 25: ordinal not in range(128)

which I guess means that header['value'] contains differently encoded characters.

I searched around, and this SO answer suggests to "put .encode('utf-8') at the end of the object for recent versions of Python".

This raised two questions for me:

On what object do I need to use .encode('utf-8'). On x or on str(x). So should it be str(x.encode('utf-8')) or on str(x).encode('utf-8')?
What does the writer mean with "recent versions of Python"? Can I still use .encode('utf-8') in Python 2.7?

Normally I would simply try it, but it is not easy (actually impossible) to find the string on which the error occurred. So I can't really test it.

A little help would be greatly appreciated here.

Ryan Chou · Accepted Answer

I suggest that you should get clearly known about the relationship between unicode and other coding format(e.g GB2312, GBK) firstly. And soon there is no major problem on encoding and decoding:)

The following diagram will show you the relationship, once you got the main point on it, you will know when and how to do the encode and decode in your code. :)

---------              -----------             ----------
|       |  1.decode(A) |         | 2.encode(B) |        |
|   A   | -----------> | unicode | ----------->|   B    |
|       | <----------- |         | <---------- |        |
|       |  4.encode(A) |         | 3.decode(B) |        |
---------              -----------             ----------

So, according to the diagram, you should know what encoding is now, and what encoding want to transform, and then follow the relationship as diagram shows.

How to use .encode('utf-8') in Python?

Tags:

python

string

character-encoding

encoding

utf-8

kramer65

1 Answers

Ryan Chou

Recent Activity

Donate For Us

How to use .encode('utf-8') in Python?

Tags:

python

string

character-encoding

encoding

utf-8

kramer65

1 Answers

Ryan Chou

Related questions

Recent Activity

Donate For Us