Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python string identity: `is` and `in` statements [duplicate]

Tags:

python

string

I had some problems getting this to work:

# Shortened for brevity
def _coerce_truth(word):
    TRUE_VALUES = ('true','1','yes')
    FALSE_VALUES = ('false','0','no')

    _word = word.lower().strip()
    print "t" in _word
    if _word in TRUE_VALUES:
        return True
    elif _word in FALSE_VALUES:
        return False

I discovered:

In [20]: "foo" is "Foo".lower()
Out[20]: False

In [21]: "foo" is "foo".lower()
Out[21]: False

In [22]: "foo" is "foo"
Out[22]: True

In [23]: "foo" is "foo".lower()
Out[23]: False

Why is this? I understand that identity is different then equality, but when is identity formed? Statement 22 should be False unless, due to the static nature of strings, id == eq. In this case I'm confused by statement 23.

Please explain and thanks in advance.

like image 504
Aaron Schif Avatar asked Sep 19 '13 20:09

Aaron Schif


People also ask

How would you confirm that 2 strings have the same identity Python?

How would you confirm that 2 strings have the same identity? The is operator returns True if 2 names point to the same location in memory. This is what we're referring to when we talk about identity. Don't confuse is with ==, the latter which only tests equality.

Does == work for strings Python?

Python has the usual comparison operations: ==, != , <, <=, >, >=. Unlike Java and C, == is overloaded to work correctly with strings. The boolean operators are the spelled out words *and*, *or*, *not* (Python does not use the C-style && || !).

Can we compare two strings using == in Python?

String Comparison using == in PythonThe == function compares the values of two strings and returns if they are equal or not. If the strings are equal, it returns True, otherwise it returns False.


2 Answers

Q. "When is identity formed?"

A. When the object is created.

What you're seeing is actually an implementation detail of Cpython -- It caches small strings and reuses them for efficiency. Other cases that are interesting are:

"foo" is "foo".strip()  # True
"foo" is "foo"[:]       # True

Ultimately, what we see is that the string literal "foo" has been cached. Every time you type "foo", you're referencing the same object in memory. However, some string methods will choose to always create new objects (like .lower()) and some will smartly re-use the input string if the method made no changes (like .strip()).


One benefit of this is that string equality can be implemented by a pointer compare (blazingly fast) followed by a character-by-character comparison if the pointer comparison is false. If the pointer comparison is True, then the character-by-character comparison can be avoided.

like image 131
mgilson Avatar answered Sep 16 '22 14:09

mgilson


As for relation between is and in:

The __contains__ method (which stands behind in operator) for tuple and list while looking for a match, first checks the identity and if that fails checks for equality. This gives you sane results even with objects that don't compare equal to themselves:

>>> x = float("NaN")
>>> t = (1, 2, x)
>>> x in (t)
True
>>> any(x == e for e in t) # this might be suprising
False
like image 40
lqc Avatar answered Sep 17 '22 14:09

lqc