Comparing string and unicode in Python 2.7.5

Question

I wonder why when I make:

a = [u'k',u'ę',u'ą']

and then type:

'k' in a

I get True, while:

'ę' in a

will give me False?

It really gives me headache and it seems someone made this on purpose to make people mad...

aIKid · Accepted Answer

And why is this?

In Python 2.x, you can't compare unicode to string directly for non-ascii characters. This will raise a warning:

Warning (from warnings module):
  File "__main__", line 1
UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal

However, in Python 3.x this doesn't appear, as all strings are unicode objects.

Solution?

You can either make the string unicode:

>>> u'ç' in a
True

Now, you're comparing both unicode objects, not unicode to string.

Or convert both to an encoding, for example utf-8 before comparing:

>>> c = u"ç"
>>> u'ç'.encode('utf-8') == c.encode('utf-8')
True

Also, to use non-ascii characters in your program, you'll have to specify the encoding, at the top of the file:

# -*- coding: utf-8 -*-

#the whole program

Hope this helps!

Comparing string and unicode in Python 2.7.5

Tags:

python

python-unicode

python-2.7

Kulawy Krul

1 Answers

aIKid

Recent Activity

Donate For Us

Comparing string and unicode in Python 2.7.5

Tags:

python

python-unicode

python-2.7

Kulawy Krul

1 Answers

aIKid

Related questions

Recent Activity

Donate For Us