Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python not interning strings when in interactive mode?

When in a Python interactive session:

In [1]: a = "my string"

In [2]: b = "my string"

In [3]: a == b
Out[3]: True

In [4]: a is b
Out[4]: False

In [5]: import sys

In [6]: print(sys.version)
3.5.2 (default, Nov 17 2016, 17:05:23) 
[GCC 5.4.0 20160609]

On the other hand, when running the following program:

#!/usr/bin/env python

import sys


def test():
    a = "my string"
    b = "my string"
    print(a == b)
    print(a is b)


if __name__ == "__main__":
    test()
    print(sys.version)

The output is:

True
True
3.5.2 (default, Nov 17 2016, 17:05:23) 
[GCC 5.4.0 20160609]

Why a is b has different outcome in the above two cases?

I am aware of this answer (and of course the difference between the == and is operators! that is the point of the question!) but aren't a and b the same object also in the first case? (interpeter?) since they point to the same (immutable) string?

like image 727
pkaramol Avatar asked Jan 24 '17 22:01

pkaramol


People also ask

Are Python strings interned?

Just like most other modern programming languages, Python also does String Interning to gain a performance boost. In Python, we can find if two objects are referring to the same in-memory object using the is operator.

How are Python strings stored in internal memory?

Strings are stored as individual characters in a contiguous memory location. It can be accessed from both directions: forward and backward. Characters are nothing but symbols. Strings are immutable Data Types in Python, which means that once a string is created, it cannot be changed.

What is interning a string?

String Interning is a method of storing only one copy of each distinct String Value, which must be immutable. By applying String. intern() on a couple of strings will ensure that all strings having the same contents share the same memory.

How do you use an intern in Python?

By using intern you ensure that you never create two string objects that have the same value - when you request the creation of a second string object with the same value as an existing string object, you receive a reference to the pre-existing string object. This way, you are saving memory.


2 Answers

This is caused by string interning. See this question for another example.

In your example, CPython interns the string constants in the module but doesn't in the REPL.

like image 137
emulbreh Avatar answered Oct 25 '22 20:10

emulbreh


So the console creates two different objects when creating two strings, but the interpreter, when running code in a single function will reuse the memory location of identical strings. Here is how to check if this is happening to you:

a = "my string"
b = "my string"

print id(a)
print id(b)

If these two ids are the same, then a is b will return True, if not then it will return False

Looks like you are using anaconda, so I checked this in the console and found different ids and then wrote a function in the editor and executed it and got the same ids.

Note: Now that we know that is determines if two variable labels point to the same object in memory, I should say that is should be used sparingly. It is usually used to compare singletons like None a is None, for example. So don't use it to compare objects, use ==, and when creating classes implement the __eq__ method so you can use the == operator.

like image 42
Amaury Larancuent Avatar answered Oct 25 '22 19:10

Amaury Larancuent