when comparing two strings in python, it works fine and when comparing a string
object with a unicode
object it fails as expected however when comparing a string
object with a converted unicode (unicode --> str)
object it fails
Works as expected:
>>> if 's' is 's': print "Hurrah!"
...
Hurrah!
Pretty much yeah:
>>> if 's' is u's': print "Hurrah!"
...
Not expected:
>>> if 's' is str(u's'): print "Hurrah!"
...
Why doesn't the third example work as expected when both the type's are of the same class?
>>> type('s')
<type 'str'>
>>> type(str(u's'))
<type 'str'>
Don't use is
for this, use ==
. You're comparing whether the objects have the same identity, not whether they are equal. Of course, if the are the same object, they will be equal (==
), but if they are equal, they aren't necessarily the same object.
The fact that the first one works is an implementation detail of CPython. Small strings, since they're immutable can be interned by the interpreter. Every time you put the string "s"
in your source code, Cpython reuses the same object. however, apparently str("s")
returns a new string with the same value. This isn't all that surprising.
You might be asking yourself, "why intern the string 's'
at all?". That's a reasonable question. After all, it's a short string -- How much memory could having multiple copies floating around in your source take? The answer (I think) is because of dictionary lookups. Since dicts with strings as keys are so common in python, you can speed up the hash function/equality checking of keys by doing lightning fast pointer comparisons (falling back on slower strcmp
) when the pointer comparison returns false.
The is
operator is used to compare the memory location of the two operands. Since strings are immutable, 's'
and 's'
occupy the same location in memory.
Due to the way unicode is handled in python2.7, u's'
and 's'
are stored in the same way/place. Therefore, they occupy the same memory location. Therefore 's' is u's'
evaluates to True
.
As @mgilson points out, 's'
and u's'
are of different types, and therefore don't occupy the same memory location, leading to 's' is u's'
evaluating to False
However, when you call str(u's')
, a new string is created and returned. This new string, because it is created anew, lives in a new location in memory, which is why the is
comparison fails.
What you really want is to check that they are equivalent strings, so use ==
In [1]: 's' == u's'
Out[1]: True
In [2]: 's' == 's'
Out[2]: True
In [3]: 's' == str(u's')
Out[3]: True
Use ==
for value comparison and is
for reference comparison. If objects have the same id
, it evaluates to True
, otherwise as with str()
, the id
is altered, so you get False
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With