Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is python storing strings so that the 'is' operator works on literals?

Tags:

python

string

In python

>>> a = 5
>>> a is 5
True

but

>>> a = 500
>>> a is 500
False

This is because it stores low integers as a single address. But once the numbers begin to be complex, each int gets its own unique address space. This makes sense to me.

The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range you actually just get back a reference to the existing object.

So now, why does this not apply to strings? Are not strings just as complex as large integers (if not moreso)?

>>> a = '1234567'
>>> a is '1234567'
True

How does python use the same address for all string literals efficiently? It cant keep an array of every possible string like it does for numbers.

like image 454
Nick Humrich Avatar asked Oct 19 '22 00:10

Nick Humrich


1 Answers

It's an optimisation technique called interning. CPython recognises the equal values of string constants and doesn't allocate extra memory for new instances but simply points to the same one (interns it), giving both the same id().

One can play around to confirm that only constants are treated this way (simple operations like b are recognised):

# Two string constants
a = "aaaa"
b = "aa" + "aa"

# Prevent interpreter from figuring out string constant
c = "aaa"
c += "a"

print id(a)         # 4509752320
print id(b)         # 4509752320
print id(c)         # 4509752176 !!

However you can manually force a string to be mapped to an already existing one using intern():

c = intern(c)

print id(a)         # 4509752320
print id(b)         # 4509752320
print id(c)         # 4509752320 !!

Other interpreters may do it differently. Since strings are immutable, changing one of the two will not change the other.

like image 133
Nils Werner Avatar answered Oct 31 '22 21:10

Nils Werner