As far as I know cpython implementation keeps the same object for some same values in order to save memory. For example when I create 2 strings with the value hello, cpython does not create 2 different PyObject:
>>> s1 = 'hello'
>>> s2 = 'hello'
>>> s1 is s2
True
I heard about it with name string interning. When I tried to check it with other python types I observed that almost all hashable (immutable) types are the same:
>>> int() is int()
True
>>> str() is str()
True
>>> frozenset() is frozenset()
True
>>> bool() is bool()
True
And almost all mutable types are the opposite (cpython creates a new PyObject even for the same values):
>>> list() is list()
False
>>> set() is set()
False
>>> dict() is dict()
False
And I think it's because we can have the same PyObject for immutable objects without having any problem.
My question arises when I see that the float type behave differently than other immutable types:
>>> float() is float()
False
Why is it different?
Mutable objects always create a new object, otherwise the data would be shared. There's not much to explain here, as if you append an item to an empty list, you don't want all of the empty lists to have that item.
Immutable objects behave in a completely different manner:
Strings get interned. If they are smaller than 20 alphanumeric characters, and are static (consts in the code, function names, etc), they get cached and are accessed from a special mapping reserved for these. It is to save memory but more importantly used to have a faster comparison. Python uses a lot of dictionary access operations under the hood which require string comparison. Being able to compare 2 strings like attribute or function names by comparing their memory address instead of the actual value, is a significant runtime improvement.
Booleans simply return the same object. Considering there are only 2 available, it makes no sense creating them again and again.
Small integers (from -5 to 256) by default, are also cached. These are used quite often, just about everywhere. Every time an integer is in that range, CPython simply returns the same object.
Floats however are not cached. Unlike integers, where the numbers 0-10 are extremely common, 1.0 isn't guaranteed to be more used than 2.0 or 0.1. That's why float() simply returns a new float. We could have optimized the empty float(), and we can check for speed benefits but it might not have made such a difference.
The confusion starts to arise when float(0.0) is float(0.0). Python has numerous optimizations built in:
First of all, consts are saved in each function's code object. 0.0 is 0.0 simply refers to the same object. It is a compile-time optimization.
Second of all, float(0.0) takes the 0.0 object, and since it's a float (which is immutable), it simply returns it. No need to create a new object if it's already a float.
Lastly, 1.0 + 1.0 is 2.0 will also work. The reason is that 1.0 + 1.0 is calculated on compile time and then references the same 2.0 object:
def test():
return 1.0 + 1.0 is 2.0
dis.dis(test)
2 0 LOAD_CONST 1 (2.0)
2 LOAD_CONST 1 (2.0)
4 IS_OP 0
6 RETURN_VALUE
As you can see, there is no addition operation. The function was compiled with the result pointing to the exact same constant object.
So while there is no float-specific optimization, 3 different generic optimizations are into play. The sum of them is what ultimately decides if it'll be the same object or not.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With