In Python, when are two objects the same?

Tags:

It seems that 2 is 2 and 3 is 3 will always be true in python, and in general, any reference to an integer is the same as any other reference to the same integer. The same happens to None (i.e., None is None). I know that this does not happen to user-defined types, or mutable types. But it sometimes fails on immutable types too:

>>> () is ()
True
>>> (2,) is (2,)
False

That is: two independent constructions of the empty tuple yield references to the same object in memory, but two independent constructions of identical one-(immutable-)element tuples end up creating two identical objects. I tested, and frozensets work in a manner similar to tuples.

What determines if an object will be duplicated in memory or will have a single instance with lots of references? Does it depend on whether the object is "atomic" in some sense? Does it vary according to implementation?

406

asked Apr 27 '16 19:04

fonini

2 Answers

Python has some types that it guarantees will only have one instance. Examples of these instances are None, NotImplemented, and Ellipsis. These are (by definition) singletons and so things like None is None are guaranteed to return True because there is no way to create a new instance of NoneType.

It also supplies a few doubletons ¹True, False ² -- All references to True point to the same object. Again, this is because there is no way to create a new instance of bool.

The above things are all guaranteed by the python language. However, as you have noticed, there are some types (all immutable) that store some instances for reuse. This is allowed by the language, but different implementations may choose to use this allowance or not -- depending on their optimization strategies. Some examples that fall into this category are small integers (-5 -> 255), the empty tuple and empty frozenset.

Finally, Cpython interns certain immutable objects during parsing...

e.g. if you run the following script with Cpython, you'll see that it returns True:

def foo():     return (2,)  if __name__ == '__main__':     print foo() is foo()

This seems really odd. The trick that Cpython is playing is that whenever it constructs the function foo, it sees a tuple-literal that contains other simple (immutable) literals. Rather than create this tuple (or it's equivalents) over and over, python just creates it once. There's no danger of that object being changed since the whole deal is immutable. This can be a big win for performance where the same tight loop is called over and over. Small strings are interned as well. The real win here is in dictionary lookups. Python can do a (blazingly fast) pointer compare and then fall back on slower string comparisons when checking hash collisions. Since so much of python is built on dictionary lookups, this can be a big optimization for the language as a whole.

^{¹I might have just made up that word ... But hopefully you get the idea...}
^{²Under normal circumstances, you don't need do check if the object is a reference to True -- Usually you just care if the object is "truthy" -- e.g. if if some_instance: ... will execute the branch. But, I put that in here just for completeness.}

Note that is can be used to compare things that aren't singletons. One common use is to create a sentinel value:

sentinel = object() item = next(iterable, sentinel) if items is sentinel:    # iterable exhausted.

Or:

_sentinel = object() def function(a, b, none_is_ok_value_here=_sentinel):     if none_is_ok_value_here is sentinel:         # Treat the function as if `none_is_ok_value_here` was not provided.

The moral of this story is to always say what you mean. If you want to check if a value is another value, then use the is operator. If you want to check if a value is equal to another value (but possibly distinct), then use ==. For more details on the difference between is and == (and when to use which), consult one of the following posts:

Is there a difference between `==` and `is` in Python?
Python None comparison: should I use "is" or ==?

Addendum

We've talked about these CPython implementation details and we've claimed that they're optimizations. It'd be nice to try to measure just what we get from all this optimizing (other than a little added confusion when working with the is operator).

String "interning" and dictionary lookups.

Here's a small script that you can run to see how much faster dictionary lookups are if you use the same string to look up the value instead of a different string. Note, I use the term "interned" in the variable names -- These values aren't necessarily interned (though they could be). I'm just using that to indicate that the "interned" string is the string in the dictionary.

import timeit  interned = 'foo' not_interned = (interned + ' ').strip()  assert interned is not not_interned   d = {interned: 'bar'}  print('Timings for short strings') number = 100000000 print(timeit.timeit(     'd[interned]',     setup='from __main__ import interned, d',     number=number)) print(timeit.timeit(     'd[not_interned]',     setup='from __main__ import not_interned, d',     number=number))   ####################################################  interned_long = interned * 100 not_interned_long = (interned_long + ' ').strip()  d[interned_long] = 'baz'  assert interned_long is not not_interned_long print('Timings for long strings') print(timeit.timeit(     'd[interned_long]',     setup='from __main__ import interned_long, d',     number=number)) print(timeit.timeit(     'd[not_interned_long]',     setup='from __main__ import not_interned_long, d',     number=number))

The exact values here shouldn't matter too much, but on my computer, the short strings show about 1 part in 7 faster. The long strings are almost 2x faster (because the string comparison takes longer if the string has more characters to compare). The differences aren't quite as striking on python3.x, but they're still definitely there.

Tuple "interning"

Here's a small script you can play around with:

import timeit  def foo_tuple():     return (2, 3, 4)  def foo_list():     return [2, 3, 4]  assert foo_tuple() is foo_tuple()  number = 10000000 t_interned_tuple = timeit.timeit('foo_tuple()', setup='from __main__ import foo_tuple', number=number) t_list = (timeit.timeit('foo_list()', setup='from __main__ import foo_list', number=number))  print(t_interned_tuple) print(t_list) print(t_interned_tuple / t_list) print('*' * 80)   def tuple_creation(x):     return (x,)  def list_creation(x):     return [x]  t_create_tuple = timeit.timeit('tuple_creation(2)', setup='from __main__ import tuple_creation', number=number) t_create_list = timeit.timeit('list_creation(2)', setup='from __main__ import list_creation', number=number) print(t_create_tuple) print(t_create_list) print(t_create_tuple / t_create_list)

This one is a bit trickier to time (and I'm happy to take any better ideas how to time it in comments). The gist of this is that on average (and on my computer), a tuple takes about 60% as long to create as a list does. However, foo_tuple() takes on average about 40% the time that foo_list() takes. That shows that we really do gain a little bit of a speedup from these interns. The time savings seem to increase as the tuple gets larger (creating a longer list takes longer -- The tuple "creation" takes constant time since it was already created).

Also note that I've called this "interning". It actually isn't (at least not in the same sense the strings are interned). We can see the difference in this simple script:

def foo_tuple():     return (2,)  def bar_tuple():     return (2,)  def foo_string():     return 'foo'  def bar_string():     return 'foo'  print(foo_tuple() is foo_tuple())  # True print(foo_tuple() is bar_tuple())  # False  print(foo_string() is bar_string())  # True

We see that the strings are really "interned" -- Different invocations using the same literal notation return the same object. The tuple "interning" seems to be specific to a single line.

111

answered Oct 10 '22 16:10

mgilson

It varies according to implementation.

CPython caches some immutable objects in memory. This is true of "small" integers like 1 and 2 (-5 to 255, as noted in the comments below). CPython does this for performance reasons; small integers are commonly used in most programs, so it saves memory to only have one copy created (and is safe because integers are immutable).

This is also true of "singleton" objects like None; there is only ever one None in existence at any given time.

Other objects (such as the empty tuple, ()) may be implemented as singletons, or they may not be.

In general, you shouldn't necessarily assume that immutable objects will be implemented this way. CPython does so for performance reasons, but other implementations may not, and CPython may even stop doing it at some point in the future. (The only exception might be None, as x is None is a common Python idiom and is likely to be implemented across different interpreters and versions.)

Usually you want to use == instead of is. Python's is operator isn't used often, except when checking to see if a variable is None.

answered Oct 10 '22 17:10

mipadi

Related questions
                            
                                Failed to install package Beautiful Soup. Error Message is "SyntaxError: Missing parentheses in call to 'print'"
                            
                                Using a sparse matrix versus numpy array
                            
                                Why does Python copy NumPy arrays where the length of the dimensions are the same?
                            
                                Does python logging.handlers.RotatingFileHandler allow creation of a group writable log file?
                            
                                CMake output name for dynamic-loaded library?
                            
                                Nested SSH session with Paramiko
                            
                                Django templates: overriding blocks of included children templates through an extended template
                            
                                How to upload a file to S3 without creating a temporary local file
                            
                                Django: Access given field's choices tuple
                            
                                I am getting the error 'redefined-outer-name'
                            
                                Official abbreviation for: import scipy as sp/sc
                            
                                How to use tf.while_loop() in tensorflow
                            
                                Python pandas groupby aggregate on multiple columns, then pivot
                            
                                Django -- User.DoesNotExist does not exist?
                            
                                How to plot 1-d data at given y-value with pylab
                            
                                Pluck in Python
                            
                                elegant find sub-list in list
                            
                                Create .zip in Python?
                            
                                rendering and saving images through Blender python
                            
                                Python psycopg2 timeout

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

In Python, when are two objects the same?

Tags:

python

object

oop

reference

python-3.x