My function returns a tuple which is then assigned to a variable x and appended to a list.
x = (u'string1', u'string2', u'string3', u'string4')
resultsList.append(x)
The function is called multiple times and final list consists of 20 tuples.
The strings within the tuple are in unicode and I would like to convert them to utf-8.
Some of the strings include also non-ASCII characters like ö, ä, etc.
Is there a way to convert them all in one step?
Use a nested list comprehension:
encoded = [[s.encode('utf8') for s in t] for t in resultsList]
This produces a list of lists containing byte strings of UTF-8 encoded data.
If you were to print these lists, you'll see Python represent the contents of the Python byte strings as Python literal strings; with quotes and with any bytes that aro not printable ASCII codepoints represented with escape sequences:
>>> l = ['Kaiserstra\xc3\x9fe']
>>> l
['Kaiserstra\xc3\x9fe']
>>> l[0]
'Kaiserstra\xc3\x9fe'
>>> print l[0]
Kaiserstraße
This is normal as Python presents this data for debugging purposes. The \xc3
and \x9f
escape sequences represent the two UTF-8 bytes C39F (hexadecimal) that are used to encode the small ringel-es character.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With