In python
>>> a = 5
>>> a is 5
True
but
>>> a = 500
>>> a is 500
False
This is because it stores low integers as a single address. But once the numbers begin to be complex, each int gets its own unique address space. This makes sense to me.
The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range you actually just get back a reference to the existing object.
So now, why does this not apply to strings? Are not strings just as complex as large integers (if not moreso)?
>>> a = '1234567'
>>> a is '1234567'
True
How does python use the same address for all string literals efficiently? It cant keep an array of every possible string like it does for numbers.
It's an optimisation technique called interning. CPython recognises the equal values of string constants and doesn't allocate extra memory for new instances but simply points to the same one (interns it), giving both the same id()
.
One can play around to confirm that only constants are treated this way (simple operations like b
are recognised):
# Two string constants
a = "aaaa"
b = "aa" + "aa"
# Prevent interpreter from figuring out string constant
c = "aaa"
c += "a"
print id(a) # 4509752320
print id(b) # 4509752320
print id(c) # 4509752176 !!
However you can manually force a string to be mapped to an already existing one using intern()
:
c = intern(c)
print id(a) # 4509752320
print id(b) # 4509752320
print id(c) # 4509752320 !!
Other interpreters may do it differently. Since strings are immutable, changing one of the two will not change the other.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With