I want to strip unicode string from the list for example airports [u'KATL',u'KCID']
expected output
[KATL,KCID]
Followed the below link
Strip all the elements of a string list
Tried one of the solution
my_list = ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']
map(str.strip, my_list) ['this', 'is', 'a', 'list', 'of', 'words']
got the following error
TypeError: descriptor 'strip' requires a 'str' object but received a 'unicode'
First, I strongly suggest you switch to Python 3, which treats Unicode strings as first-class citizens (all strings are Unicode strings, but they are called str
).
But if you have to make it work in Python 2, you can strip unicode
strings with unicode.strip
(if your strings are true Unicode strings):
>>> lst = [u'KATL\n', u'KCID\n']
>>> map(unicode.strip, lst)
[u'KATL', u'KCID']
If your unicode
strings are limited to ASCII subset, you can convert them to str
with:
>>> lst = [u'KATL', u'KCID']
>>> map(str, lst)
['KATL', 'KCID']
Note that this conversion will fail for non-ASCII strings. To encode Unicode codepoints as a str
(string of bytes), you have to choose your encoding algorithm (usually UTF-8) and use .encode()
method on your strings:
>>> lst = [u'KATL', u'KCID']
>>> map(lambda x: x.encode('utf-8'), lst)
['KATL', 'KCID']
The only reliable to convert a unicode string to a byte string is to encode it into an acceptable encoding (ascii, Latin1 and UTF8 are most common one). By definition, UTF8 is able to encode any unicode character, but you will find non ascii chars in the string, and the size in byte will no longer be the number of (unicode) characters. Latin1 is able to represent most of west european languages characters in with a 1 byte per character relation, and ascii is the set of characters that are always correctly represented.
If you want to be able to process strings containing characters not representable in the choosen charset, you can use the parameter errors='ignore'
to just remove them or errors='replace'
to replace them with a replacement character, often ?
.
So if I have correctly understood your requirement, you could translate the list of unicode string into a list of byte strings with:
[ x.encode('ascii', errors='replace') for x in my_list ]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With