Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can a non-assigned string in Python have an address in memory?

Can someone explain this to me? So I've been playing with the id() command in python and came across this:

>>> id('cat') 5181152 >>> a = 'cat' >>> b = 'cat' >>> id(a) 5181152 >>> id(b) 5181152 

This makes some sense to me except for one part: The string 'cat' has an address in memory before I assign it to a variable. I probably just don't understand how memory addressing works but can someone explain this to me or at least tell me that I should read up on memory addressing?

So that is all well and good but this confused me further:

>>> a = a[0:2]+'t' >>> a 'cat' >>> id(a) 39964224 >>> id('cat') 5181152 

This struck me as weird because 'cat' is a string with an address of 5181152 but the new a has a different address. So if there are two 'cat' strings in memory why aren't two addresses printed for id('cat')? My last thought was that the concatenation had something to do with the change in address so I tried this:

>>> id(b[0:2]+'t') 39921024 >>> b = b[0:2]+'t' >>> b 'cat' >>> id(b) 40000896 

I would have predicted the IDs to be the same but that was not the case. Thoughts?

like image 986
Usagi Avatar asked Aug 03 '11 19:08

Usagi


People also ask

How is Python string stored in memory?

A string in Python is just a sequence of Unicode characters enclosed within quotes. Remember that in Python there can be single quotes, double quotes, or even triple single or triple double quotes.

How does Python store data in memory?

It is possible to store the state of a Python object in the form of byte stream directly to a file, or memory stream and retrieve to its original state. This process is called serialization and de-serialization. Python's built in library contains various modules for serialization and deserialization process.

How do you show memory address in Python?

Method 1: Using id() We can get an address using the id() function. id() function gives the address of the particular object.

Which one gives memory location of a variable in Python?

According to the manual, in CPython id() is the actual memory address of the variable. If you want it in hex format, call hex() on it. this will print the memory address of x.


2 Answers

Python reuses string literals fairly aggressively. The rules by which it does so are implementation-dependent, but CPython uses two that I'm aware of:

  • Strings that contain only characters valid in Python identifiers are interned, which means they are stored in a big table and reused wherever they occur. So, no matter where you use "cat", it always refers to the same string object.
  • String literals in the same code block are reused regardless of their content and length. If you put a string literal of the entire Gettysburg Address in a function, twice, it's the same string object both times. In separate functions, they are different objects: def foo(): return "pack my box with five dozen liquor jugs" def bar(): return "pack my box with five dozen liquor jugs" assert foo() is bar() # AssertionError

Both optimizations are done at compile time (that is, when the bytecode is generated).

On the other hand, something like chr(99) + chr(97) + chr(116) is a string expression that evaluates to the string "cat". In a dynamic language like Python, its value can't be known at compile time (chr() is a built-in function, but you might have reassigned it) so it normally isn't interned. Thus its id() is different from that of "cat". However, you can force a string to be interned using the intern() function. Thus:

id(intern(chr(99) + chr(97) + chr(116))) == id("cat")   # True 

As others have mentioned, interning is possible because strings are immutable. It isn't possible to change "cat" to "dog", in other words. You have to generate a new string object, which means that there's no danger that other names pointing to the same string will be affected.

Just as an aside, Python also converts expressions containing only constants (like "c" + "a" + "t") to constants at compile time, as the below disassembly shows. These will be optimized to point to identical string objects per the rules above.

>>> def foo(): "c" + "a" + "t" ... >>> from dis import dis; dis(foo)   1           0 LOAD_CONST               5 ('cat')               3 POP_TOP               4 LOAD_CONST               0 (None)               7 RETURN_VALUE 
like image 120
kindall Avatar answered Sep 22 '22 08:09

kindall


'cat' has an address because you create it in order to pass it to id(). You haven't yet bound it to a name, but the object still exists.

Python caches and reuses short strings. But if you assemble strings by concatenation, then the code that searches the cache and attempts re-use is bypassed.

Note that the inner workings of the string cache is pure implementation detail and should not be relied upon.

like image 27
David Heffernan Avatar answered Sep 22 '22 08:09

David Heffernan