How to convert a unicode list of tuples into utf-8 with python

Question

My function returns a tuple which is then assigned to a variable x and appended to a list.

x = (u'string1', u'string2', u'string3', u'string4')
resultsList.append(x)

The function is called multiple times and final list consists of 20 tuples.

The strings within the tuple are in unicode and I would like to convert them to utf-8.

Some of the strings include also non-ASCII characters like ö, ä, etc.

Is there a way to convert them all in one step?

Martijn Pieters · Accepted Answer

Use a nested list comprehension:

encoded = [[s.encode('utf8') for s in t] for t in resultsList]

This produces a list of lists containing byte strings of UTF-8 encoded data.

If you were to print these lists, you'll see Python represent the contents of the Python byte strings as Python literal strings; with quotes and with any bytes that aro not printable ASCII codepoints represented with escape sequences:

>>> l = ['Kaiserstra\xc3\x9fe']
>>> l
['Kaiserstra\xc3\x9fe']
>>> l[0]
'Kaiserstra\xc3\x9fe'
>>> print l[0]
Kaiserstraße

This is normal as Python presents this data for debugging purposes. The \xc3 and \x9f escape sequences represent the two UTF-8 bytes C39F (hexadecimal) that are used to encode the small ringel-es character.

How to convert a unicode list of tuples into utf-8 with python

Tags:

python

unicode

utf-8

user2560609

1 Answers

Martijn Pieters

Recent Activity

Donate For Us

How to convert a unicode list of tuples into utf-8 with python

Tags:

python

unicode

utf-8

user2560609

1 Answers

Martijn Pieters

Related questions

Recent Activity

Donate For Us