a = {"a":"çö"}
b = "çö"
a['a']
>>> '\xc3\xa7\xc3\xb6'
b.decode('utf-8') == a['a']
>>> False
What is going in there?
edit= I'm sorry, it was my mistake. It is still False. I'm using Python 2.6 on Ubuntu 10.04.
The characters having greater Unicode values are considered as greater value characters. For comparison of two strings, there is no special way. If we directly compare the values of strings, we use the '==' operator. If strings are identical, it returns True, otherwise False.
UTF-8 is one of the most commonly used encodings, and Python often defaults to using it.
In Python 2, the default encoding is ASCII (unfortunately). UTF-16 is variable 2 or 4 bytes. This encoding is great for Asian text as most of it can be encoded in 2 bytes each. It's bad for English as all English characters also need 2 bytes here.
In Python 3, all strings are sequences of Unicode characters. There is a bytes type that holds raw bytes. This does not distinguish "Unicode or ASCII"; it only distinguishes Python types.
Either write like this:
a = {"a": u"çö"}
b = "çö"
b.decode('utf-8') == a['a']
Or like this (you may also skip the .decode('utf-8')
on both sides):
a = {"a": "çö"}
b = "çö"
b.decode('utf-8') == a['a'].decode('utf-8')
Or like this (my recommendation):
a = {"a": u"çö"}
b = u"çö"
b == a['a']
Updated based on Tim's comment. In your original code, b.decode('utf-8') == u'çö'
and a['a'] == 'çö'
, so you're actually making the following comparison:
u'çö' == 'çö'
One of the objects is of type unicode
, the other is of type str
, so in order to execute the comparison, the str
is converted to unicode
and then the two unicode
objects are compared. It works fine in the case of purely ASCII strings, for example: u'a' == 'a'
, since unicode('a') == u'a'
.
However, it fails in case of u'çö' == 'çö'
, since unicode('çö')
returns the following error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128), and therefore the whole comparison returns False and issues the following warning: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With