<pre class="prettyprint"><code>a = {"a":"çö"} b = "çö" a['a'] >>> '\xc3\xa7\xc3\xb6' b.decode('utf-8') == a['a'] >>> False </code></pre> What is going in there? edit= I'm sorry, it was my mistake. It is still False. I'm using Python 2.6 on Ubuntu 10.04.

<h3>Possible solutions</h3> Either write like this: <pre class="prettyprint"><code>a = {"a": u"çö"} b = "çö" b.decode('utf-8') == a['a'] </code></pre> Or like this (you may also skip the <code>.decode('utf-8')</code> on both sides): <pre class="prettyprint"><code>a = {"a": "çö"} b = "çö" b.decode('utf-8') == a['a'].decode('utf-8') </code></pre> Or like this (my recommendation): <pre class="prettyprint"><code>a = {"a": u"çö"} b = u"çö" b == a['a'] </code></pre> <h3>Explanation</h3> Updated based on Tim's comment. In your original code, <code>b.decode('utf-8') == u'çö'</code> and <code>a['a'] == 'çö'</code>, so you're actually making the following comparison: <pre class="prettyprint"><code>u'çö' == 'çö' </code></pre> One of the objects is of type <code>unicode</code>, the other is of type <code>str</code>, so in order to execute the comparison, the <code>str</code> is converted to <code>unicode</code> and then the two <code>unicode</code> objects are compared. It works fine in the case of purely ASCII strings, for example: <code>u'a' == 'a'</code>, since <code>unicode('a') == u'a'</code>. However, it fails in case of <code>u'çö' == 'çö'</code>, since <code>unicode('çö')</code> returns the following error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128), and therefore the whole comparison returns False and issues the following warning: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal.

Python UTF-8 comparison

Tags:

python

unicode

python-2.x

utf-8

Click to copy

a = {"a":"çö"}
b = "çö"
a['a']
>>> '\xc3\xa7\xc3\xb6'

b.decode('utf-8') == a['a']
>>> False

What is going in there?

edit= I'm sorry, it was my mistake. It is still False. I'm using Python 2.6 on Ubuntu 10.04.

928

asked Aug 03 '10 19:08

erkangur

1 Answers

Possible solutions

Either write like this:

Click to copy

a = {"a": u"çö"}
b = "çö"
b.decode('utf-8') == a['a']

Or like this (you may also skip the .decode('utf-8') on both sides):

Click to copy

a = {"a": "çö"}
b = "çö"
b.decode('utf-8') == a['a'].decode('utf-8')

Or like this (my recommendation):

Click to copy

a = {"a": u"çö"}
b = u"çö"
b == a['a']

Explanation

Updated based on Tim's comment. In your original code, b.decode('utf-8') == u'çö' and a['a'] == 'çö', so you're actually making the following comparison:

Click to copy

u'çö' == 'çö'

One of the objects is of type unicode, the other is of type str, so in order to execute the comparison, the str is converted to unicode and then the two unicode objects are compared. It works fine in the case of purely ASCII strings, for example: u'a' == 'a', since unicode('a') == u'a'.

However, it fails in case of u'çö' == 'çö', since unicode('çö') returns the following error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128), and therefore the whole comparison returns False and issues the following warning: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal.

100

answered Oct 13 '22 04:10

Bolo

Related questions
                            
                                python tkinter: how to work with pixels?
                            
                                python pandas dataframe thread safe?
                            
                                Python pip error: "Cannot fetch index base URL https://pypi.python.org/simple/"
                            
                                python threading blocks
                            
                                Python => ValueError: unsupported format character 'Y' (0x59)
                            
                                Get MM-DD-YYYY from pandas Timestamp
                            
                                Django: Reverse for 'detail' with arguments '('',)' and keyword arguments '{}' not found
                            
                                What's the difference between numpy.take and numpy.choose?
                            
                                UUID as default value in Django model
                            
                                How to use refresh token to obtain new access token on django-oauth-toolkit?
                            
                                Is there a concise way to show all rows in pandas for just the current command?
                            
                                PyQT: how to open new window
                            
                                How to remove the space between subplots in matplotlib.pyplot?
                            
                                PyQt showing video stream from opencv
                            
                                How to consume the Github GraphQL API using Python?
                            
                                How to create a grouped bar plot
                            
                                Matplotlib move tick labels inside plot area
                            
                                Does Python have the Elvis operator?
                            
                                Is there a sendKey for Mac in Python?
                            
                                Django urlsafe base64 decoding with decryption

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python UTF-8 comparison

Tags:

python

unicode

python-2.x

utf-8

erkangur

People also ask

1 Answers

Possible solutions

Explanation

Bolo

Recent Activity

Donate For Us