Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python's layout of low-value ints in memory

Tags:

python

memory

My question is: where do these patterns (below) originate?

I learned (somewhere) that Python has unique "copies", if that's the right word, for small integers. For example:

>>> x = y = 0
>>> id(0)
4297074752
>>> id(x)
4297074752
>>> id(y)
4297074752
>>> x += 1
>>> id(x)
4297074728
>>> y
0

When I look at the memory locations of ints, there is a simple pattern early:

>>> N = id(0)
>>> for i in range(5):
...     print i, N - id(i)
... 
0 0
1 24
2 48
3 72
4 96
>>> bin(24)
'0b11000'

It's not clear to me why this is chosen as the increment. Moreover, I can't explain this pattern at all above 256:

>>> prev = 0
>>> for i in range(270):
...     t = (id(i-1), id(i))
...     diff = t[0] - t[1]
...     if diff != prev:
...         print i-1, i, t, diff
...         prev = diff
... 
-1 0 (4297074776, 4297074752) 24
35 36 (4297073912, 4297075864) -1952
36 37 (4297075864, 4297075840) 24
76 77 (4297074904, 4297076856) -1952
77 78 (4297076856, 4297076832) 24
117 118 (4297075896, 4297077848) -1952
118 119 (4297077848, 4297077824) 24
158 159 (4297076888, 4297078840) -1952
159 160 (4297078840, 4297078816) 24
199 200 (4297077880, 4297079832) -1952
200 201 (4297079832, 4297079808) 24
240 241 (4297078872, 4297080824) -1952
241 242 (4297080824, 4297080800) 24
256 257 (4297080464, 4297155264) -74800
257 258 (4297155072, 4297155288) -216
259 260 (4297155072, 4297155336) -264
260 261 (4297155048, 4297155432) -384
261 262 (4297155024, 4297155456) -432
262 263 (4297380280, 4297155384) 224896
263 264 (4297155000, 4297155240) -240
264 265 (4297155072, 4297155216) -144
266 267 (4297155072, 4297155168) -96
267 268 (4297155024, 4297155144) -120

Any thoughts, clues, places to look?

Edit: and what's special about 24?

Update: The standard library has sys.getsizeof() which returns 24 when I call it with 1 as argument. That's a lot of bytes, but on a 64-bit machine, we have 8 bytes each for the type, the value and the ref count. Also, see here, and the C API reference here.

Spent some time with the "source" in the link from Peter Hansen in comments. Couldn't find the definition of an int (beyond a declaration of *int_int), but I did find:

#define NSMALLPOSINTS       257
#define NSMALLNEGINTS       5
like image 996
telliott99 Avatar asked Feb 03 '10 22:02

telliott99


1 Answers

Low-value integers are preallocated, high value integers are allocated whenever they are computed. Integers that appear in source code are the same object. On my system,

>>> id(2) == id(1+1)
True
>>> id(1000) == id(1000+0)
False
>>> id(1000) == id(1000)
True

You'll also notice that the ids depend on the system. They're just memory addresses, assigned by the system allocator (or possibly the linker, for static objects?)

>>> id(0)
8402324

Edit: The reason id(1000) == id(1000) is because the Python compiler notices that two of the integer constants in the code it's compiling are the same, so it only allocates one object for both. This would be an unacceptable performance hit at runtime, but at compile time it's not noticeable. (Yes, the interpreter is also a compiler. Most interpreters are also compilers, very few aren't.)

like image 129
Dietrich Epp Avatar answered Sep 18 '22 11:09

Dietrich Epp