I found unicode in python really troublesome, why not Python use utf-8 for all the strings? I am in China so I have to use some Chinese string that can't represent by ascii, I use u''
to denote a string, it works well in my ubuntu machine, but in another ubuntu machine (VPS provided by linode.com), it fails some times. The error is:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128)
The code I am using is:
self.talk(user.record["fullname"] + u"准备好了")
The thing with the famous UnicodeDecodeError
is when you do some string manipulation like the one you did just now:
user.record["fullname"] + u" 准备好了"
because what you're doing is concatenating an str with unicode , so python will do an implicit coercion of the str to an unicode before doing the concatenation this coercion is done like this:
unicode(user.record["fullname"]) + u" 准备好了"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Problem
And there is the problem because when doing unicode(something)
python will decode the string using the default encoding which is ASCII in python 2.* and if it happen that your string user.record["fullname"]
have some no-ASCII character it will raise the famous UnicodeDecodeError
error.
so how you can solve it :
# Decode the str to unicode using the right encoding
# here i used utf-8 because mostly is the right one but maybe it not (another problem!!!)
a = user.record["fullname"].decode('utf-8')
self.talk(a + u" 准备好了")
PS: Now in python 3 the default encoding is utf-8 and one other thing you can't do a concatenation of a unicode with the string (byte in python 3.) so no more implicit coercion
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With