Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert a unicode list of tuples into utf-8 with python

My function returns a tuple which is then assigned to a variable x and appended to a list.

x = (u'string1', u'string2', u'string3', u'string4')
resultsList.append(x)

The function is called multiple times and final list consists of 20 tuples.

The strings within the tuple are in unicode and I would like to convert them to utf-8.

Some of the strings include also non-ASCII characters like ö, ä, etc.

Is there a way to convert them all in one step?

like image 358
user2560609 Avatar asked Dec 26 '22 00:12

user2560609


1 Answers

Use a nested list comprehension:

encoded = [[s.encode('utf8') for s in t] for t in resultsList]

This produces a list of lists containing byte strings of UTF-8 encoded data.

If you were to print these lists, you'll see Python represent the contents of the Python byte strings as Python literal strings; with quotes and with any bytes that aro not printable ASCII codepoints represented with escape sequences:

>>> l = ['Kaiserstra\xc3\x9fe']
>>> l
['Kaiserstra\xc3\x9fe']
>>> l[0]
'Kaiserstra\xc3\x9fe'
>>> print l[0]
Kaiserstraße

This is normal as Python presents this data for debugging purposes. The \xc3 and \x9f escape sequences represent the two UTF-8 bytes C39F (hexadecimal) that are used to encode the small ringel-es character.

like image 50
Martijn Pieters Avatar answered Dec 30 '22 11:12

Martijn Pieters