Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python string interning

While this question doesn't have any real use in practice, I am curious as to how Python does string interning. I have noticed the following.

>>> "string" is "string" True 

This is as I expected.

You can also do this.

>>> "strin"+"g" is "string" True 

And that's pretty clever!

But you can't do this.

>>> s1 = "strin" >>> s2 = "string" >>> s1+"g" is s2 False 

Why wouldn't Python evaluate s1+"g", and realize it is the same as s2 and point it to the same address? What is actually going on in that last block to have it return False?

like image 286
Ze'ev G Avatar asked Mar 21 '13 07:03

Ze'ev G


People also ask

What is string interning in Python?

String Interning is a process of storing only one copy of each distinct string value in memory. This means that, when we create two strings with the same value - instead of allocating memory for both of them, only one string is actually committed to memory. The other one just points to that same memory location.

What is interning a string?

In computer science, string interning is a method of storing only one copy of each distinct string value, which must be immutable. Interning strings makes some string processing tasks more time- or space-efficient at the cost of requiring more time when the string is created or interned.

What is object interning in Python?

Interned objects are pre-created objects in memory that can be accessed from anywhere in your program. Before creating a new object in memory, Python will check to see if it already exists as one of these intern objects. 00:33 If it does, the name will point to it.

Are strings cached in Python?

String interning is the method of caching particular strings in memory as they are instantiated. The idea is that, since strings in Python are immutable objects, only one instance of a particular string is needed at a time.


2 Answers

This is implementation-specific, but your interpreter is probably interning compile-time constants but not the results of run-time expressions.

In what follows CPython 3.9.0+ is used.

In the second example, the expression "strin"+"g" is evaluated at compile time, and is replaced with "string". This makes the first two examples behave the same.

If we examine the bytecodes, we'll see that they are exactly the same:

  # s1 = "string"   1           0 LOAD_CONST               0 ('string')               2 STORE_NAME               0 (s1)    # s2 = "strin" + "g"   2           4 LOAD_CONST               0 ('string')               6 STORE_NAME               1 (s2) 

This bytecode was obtained with (which prints a few more lines after the above):

import dis  source = 's1 = "string"\ns2 = "strin" + "g"' code = compile(source, '', 'exec') print(dis.dis(code)) 

The third example involves a run-time concatenation, the result of which is not automatically interned:

  # s3a = "strin"   3           8 LOAD_CONST               1 ('strin')              10 STORE_NAME               2 (s3a)    # s3 = s3a + "g"   4          12 LOAD_NAME                2 (s3a)              14 LOAD_CONST               2 ('g')              16 BINARY_ADD              18 STORE_NAME               3 (s3)              20 LOAD_CONST               3 (None)              22 RETURN_VALUE 

This bytecode was obtained with (which prints a few more lines before the above, and those lines are exactly as in the first block of bytecodes given above):

import dis  source = (     's1 = "string"\n'     's2 = "strin" + "g"\n'     's3a = "strin"\n'     's3 = s3a + "g"') code = compile(source, '', 'exec') print(dis.dis(code)) 

If you were to manually sys.intern() the result of the third expression, you'd get the same object as before:

>>> import sys >>> s3a = "strin" >>> s3 = s3a + "g" >>> s3 is "string" False >>> sys.intern(s3) is "string" True 

Also, Python 3.9 prints a warning for the last two statements above:

SyntaxWarning: "is" with a literal. Did you mean "=="?

like image 145
NPE Avatar answered Oct 06 '22 06:10

NPE


Case 1

>>> x = "123"   >>> y = "123"   >>> x == y   True   >>> x is y   True   >>> id(x)   50986112   >>> id(y)   50986112   

Case 2

>>> x = "12" >>> y = "123" >>> x = x + "3" >>> x is y False >>> x == y True 

Now, your question is why the id is same in case 1 and not in case 2.
In case 1, you have assigned a string literal "123" to x and y.

Since string are immutable, it makes sense for the interpreter to store the string literal only once and point all the variables to the same object.
Hence you see the id as identical.

In case 2, you are modifying x using concatenation. Both x and y has same values, but not same identity.
Both points to different objects in memory. Hence they have different id and is operator returned False

like image 38
cppcoder Avatar answered Oct 06 '22 06:10

cppcoder