Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python unicode: why in one machine works but in another one it failed sometimes?

I found unicode in python really troublesome, why not Python use utf-8 for all the strings? I am in China so I have to use some Chinese string that can't represent by ascii, I use u'' to denote a string, it works well in my ubuntu machine, but in another ubuntu machine (VPS provided by linode.com), it fails some times. The error is:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128)

The code I am using is:

self.talk(user.record["fullname"] + u"准备好了")
like image 492
Bin Chen Avatar asked Dec 12 '22 17:12

Bin Chen


1 Answers

The thing with the famous UnicodeDecodeError is when you do some string manipulation like the one you did just now:

user.record["fullname"] + u" 准备好了"

because what you're doing is concatenating an str with unicode , so python will do an implicit coercion of the str to an unicode before doing the concatenation this coercion is done like this:

unicode(user.record["fullname"]) + u" 准备好了"
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
         Problem

And there is the problem because when doing unicode(something) python will decode the string using the default encoding which is ASCII in python 2.* and if it happen that your string user.record["fullname"] have some no-ASCII character it will raise the famous UnicodeDecodeError error.

so how you can solve it :

# Decode the str to unicode using the right encoding
# here i used utf-8 because mostly is the right one but maybe it not (another problem!!!)
a = user.record["fullname"].decode('utf-8')

self.talk(a + u" 准备好了")

PS: Now in python 3 the default encoding is utf-8 and one other thing you can't do a concatenation of a unicode with the string (byte in python 3.) so no more implicit coercion

like image 118
mouad Avatar answered Dec 29 '22 00:12

mouad