Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Python, when are two objects the same?

It seems that 2 is 2 and 3 is 3 will always be true in python, and in general, any reference to an integer is the same as any other reference to the same integer. The same happens to None (i.e., None is None). I know that this does not happen to user-defined types, or mutable types. But it sometimes fails on immutable types too:

>>> () is ()
True
>>> (2,) is (2,)
False

That is: two independent constructions of the empty tuple yield references to the same object in memory, but two independent constructions of identical one-(immutable-)element tuples end up creating two identical objects. I tested, and frozensets work in a manner similar to tuples.

What determines if an object will be duplicated in memory or will have a single instance with lots of references? Does it depend on whether the object is "atomic" in some sense? Does it vary according to implementation?

like image 406
fonini Avatar asked Apr 27 '16 19:04

fonini


People also ask

How can you tell if two objects are the same?

To determine if two objects are not identical Set up a Boolean expression to test the two objects. In your testing expression, use the IsNot operator with the two objects as operands. IsNot returns True if the objects do not point to the same class instance.

Can two variables refer to the same object Python?

Use the is operator to check if two variables reference the same object. Use the is operator to check two variables for identity and == to check for two variables for equality.

How do you check for equality in Python?

Python strings equality can be checked using == operator or __eq__() function. Python strings are case sensitive, so these equality check methods are also case sensitive.

What is the difference between identical and equal objects in Python?

The is operator compares the identity of two objects while the == operator compares the values of two objects. There is a difference in meaning between equal and identical. And this difference is important when you want to understand how Python's is and == comparison operators behave.


2 Answers

Python has some types that it guarantees will only have one instance. Examples of these instances are None, NotImplemented, and Ellipsis. These are (by definition) singletons and so things like None is None are guaranteed to return True because there is no way to create a new instance of NoneType.

It also supplies a few doubletons 1True, False 2 -- All references to True point to the same object. Again, this is because there is no way to create a new instance of bool.

The above things are all guaranteed by the python language. However, as you have noticed, there are some types (all immutable) that store some instances for reuse. This is allowed by the language, but different implementations may choose to use this allowance or not -- depending on their optimization strategies. Some examples that fall into this category are small integers (-5 -> 255), the empty tuple and empty frozenset.

Finally, Cpython interns certain immutable objects during parsing...

e.g. if you run the following script with Cpython, you'll see that it returns True:

def foo():     return (2,)  if __name__ == '__main__':     print foo() is foo() 

This seems really odd. The trick that Cpython is playing is that whenever it constructs the function foo, it sees a tuple-literal that contains other simple (immutable) literals. Rather than create this tuple (or it's equivalents) over and over, python just creates it once. There's no danger of that object being changed since the whole deal is immutable. This can be a big win for performance where the same tight loop is called over and over. Small strings are interned as well. The real win here is in dictionary lookups. Python can do a (blazingly fast) pointer compare and then fall back on slower string comparisons when checking hash collisions. Since so much of python is built on dictionary lookups, this can be a big optimization for the language as a whole.


1I might have just made up that word ... But hopefully you get the idea...
2Under normal circumstances, you don't need do check if the object is a reference to True -- Usually you just care if the object is "truthy" -- e.g. if if some_instance: ... will execute the branch. But, I put that in here just for completeness.


Note that is can be used to compare things that aren't singletons. One common use is to create a sentinel value:

sentinel = object() item = next(iterable, sentinel) if items is sentinel:    # iterable exhausted. 

Or:

_sentinel = object() def function(a, b, none_is_ok_value_here=_sentinel):     if none_is_ok_value_here is sentinel:         # Treat the function as if `none_is_ok_value_here` was not provided. 

The moral of this story is to always say what you mean. If you want to check if a value is another value, then use the is operator. If you want to check if a value is equal to another value (but possibly distinct), then use ==. For more details on the difference between is and == (and when to use which), consult one of the following posts:

  • Is there a difference between `==` and `is` in Python?
  • Python None comparison: should I use "is" or ==?

Addendum

We've talked about these CPython implementation details and we've claimed that they're optimizations. It'd be nice to try to measure just what we get from all this optimizing (other than a little added confusion when working with the is operator).

String "interning" and dictionary lookups.

Here's a small script that you can run to see how much faster dictionary lookups are if you use the same string to look up the value instead of a different string. Note, I use the term "interned" in the variable names -- These values aren't necessarily interned (though they could be). I'm just using that to indicate that the "interned" string is the string in the dictionary.

import timeit  interned = 'foo' not_interned = (interned + ' ').strip()  assert interned is not not_interned   d = {interned: 'bar'}  print('Timings for short strings') number = 100000000 print(timeit.timeit(     'd[interned]',     setup='from __main__ import interned, d',     number=number)) print(timeit.timeit(     'd[not_interned]',     setup='from __main__ import not_interned, d',     number=number))   ####################################################  interned_long = interned * 100 not_interned_long = (interned_long + ' ').strip()  d[interned_long] = 'baz'  assert interned_long is not not_interned_long print('Timings for long strings') print(timeit.timeit(     'd[interned_long]',     setup='from __main__ import interned_long, d',     number=number)) print(timeit.timeit(     'd[not_interned_long]',     setup='from __main__ import not_interned_long, d',     number=number)) 

The exact values here shouldn't matter too much, but on my computer, the short strings show about 1 part in 7 faster. The long strings are almost 2x faster (because the string comparison takes longer if the string has more characters to compare). The differences aren't quite as striking on python3.x, but they're still definitely there.

Tuple "interning"

Here's a small script you can play around with:

import timeit  def foo_tuple():     return (2, 3, 4)  def foo_list():     return [2, 3, 4]  assert foo_tuple() is foo_tuple()  number = 10000000 t_interned_tuple = timeit.timeit('foo_tuple()', setup='from __main__ import foo_tuple', number=number) t_list = (timeit.timeit('foo_list()', setup='from __main__ import foo_list', number=number))  print(t_interned_tuple) print(t_list) print(t_interned_tuple / t_list) print('*' * 80)   def tuple_creation(x):     return (x,)  def list_creation(x):     return [x]  t_create_tuple = timeit.timeit('tuple_creation(2)', setup='from __main__ import tuple_creation', number=number) t_create_list = timeit.timeit('list_creation(2)', setup='from __main__ import list_creation', number=number) print(t_create_tuple) print(t_create_list) print(t_create_tuple / t_create_list) 

This one is a bit trickier to time (and I'm happy to take any better ideas how to time it in comments). The gist of this is that on average (and on my computer), a tuple takes about 60% as long to create as a list does. However, foo_tuple() takes on average about 40% the time that foo_list() takes. That shows that we really do gain a little bit of a speedup from these interns. The time savings seem to increase as the tuple gets larger (creating a longer list takes longer -- The tuple "creation" takes constant time since it was already created).

Also note that I've called this "interning". It actually isn't (at least not in the same sense the strings are interned). We can see the difference in this simple script:

def foo_tuple():     return (2,)  def bar_tuple():     return (2,)  def foo_string():     return 'foo'  def bar_string():     return 'foo'  print(foo_tuple() is foo_tuple())  # True print(foo_tuple() is bar_tuple())  # False  print(foo_string() is bar_string())  # True 

We see that the strings are really "interned" -- Different invocations using the same literal notation return the same object. The tuple "interning" seems to be specific to a single line.

like image 111
mgilson Avatar answered Oct 10 '22 16:10

mgilson


It varies according to implementation.

CPython caches some immutable objects in memory. This is true of "small" integers like 1 and 2 (-5 to 255, as noted in the comments below). CPython does this for performance reasons; small integers are commonly used in most programs, so it saves memory to only have one copy created (and is safe because integers are immutable).

This is also true of "singleton" objects like None; there is only ever one None in existence at any given time.

Other objects (such as the empty tuple, ()) may be implemented as singletons, or they may not be.

In general, you shouldn't necessarily assume that immutable objects will be implemented this way. CPython does so for performance reasons, but other implementations may not, and CPython may even stop doing it at some point in the future. (The only exception might be None, as x is None is a common Python idiom and is likely to be implemented across different interpreters and versions.)

Usually you want to use == instead of is. Python's is operator isn't used often, except when checking to see if a variable is None.

like image 42
mipadi Avatar answered Oct 10 '22 17:10

mipadi