How to strip unicode in a list

Question

I want to strip unicode string from the list for example airports [u'KATL',u'KCID']

expected output

[KATL,KCID]

Followed the below link

Strip all the elements of a string list

Tried one of the solution

my_list = ['this ', 'is ', 'a ', 'list ', 'of ', 'words ']

map(str.strip, my_list) ['this', 'is', 'a', 'list', 'of', 'words']

got the following error

TypeError: descriptor 'strip' requires a 'str' object but received a 'unicode'

randomir · Accepted Answer

First, I strongly suggest you switch to Python 3, which treats Unicode strings as first-class citizens (all strings are Unicode strings, but they are called str).

But if you have to make it work in Python 2, you can strip unicode strings with unicode.strip (if your strings are true Unicode strings):

>>> lst = [u'KATL
', u'KCID
']
>>> map(unicode.strip, lst)
[u'KATL', u'KCID']

If your unicode strings are limited to ASCII subset, you can convert them to str with:

>>> lst = [u'KATL', u'KCID']
>>> map(str, lst)
['KATL', 'KCID']

Note that this conversion will fail for non-ASCII strings. To encode Unicode codepoints as a str (string of bytes), you have to choose your encoding algorithm (usually UTF-8) and use .encode() method on your strings:

>>> lst = [u'KATL', u'KCID']
>>> map(lambda x: x.encode('utf-8'), lst)
['KATL', 'KCID']

Serge Ballesta · Answer

The only reliable to convert a unicode string to a byte string is to encode it into an acceptable encoding (ascii, Latin1 and UTF8 are most common one). By definition, UTF8 is able to encode any unicode character, but you will find non ascii chars in the string, and the size in byte will no longer be the number of (unicode) characters. Latin1 is able to represent most of west european languages characters in with a 1 byte per character relation, and ascii is the set of characters that are always correctly represented.

If you want to be able to process strings containing characters not representable in the choosen charset, you can use the parameter errors='ignore' to just remove them or errors='replace' to replace them with a replacement character, often ?.

So if I have correctly understood your requirement, you could translate the list of unicode string into a list of byte strings with:

[ x.encode('ascii', errors='replace') for x in my_list ]

How to strip unicode in a list

Tags:

python

unicode

Hariom Singh

2 Answers

randomir

Serge Ballesta

Recent Activity

Donate For Us

How to strip unicode in a list

Tags:

python

unicode

Hariom Singh

2 Answers

randomir

Serge Ballesta

Related questions

Recent Activity

Donate For Us