My question is: where do these patterns (below) originate?
I learned (somewhere) that Python has unique "copies", if that's the right word, for small integers. For example:
>>> x = y = 0
>>> id(0)
4297074752
>>> id(x)
4297074752
>>> id(y)
4297074752
>>> x += 1
>>> id(x)
4297074728
>>> y
0
When I look at the memory locations of ints, there is a simple pattern early:
>>> N = id(0)
>>> for i in range(5):
... print i, N - id(i)
...
0 0
1 24
2 48
3 72
4 96
>>> bin(24)
'0b11000'
It's not clear to me why this is chosen as the increment. Moreover, I can't explain this pattern at all above 256:
>>> prev = 0
>>> for i in range(270):
... t = (id(i-1), id(i))
... diff = t[0] - t[1]
... if diff != prev:
... print i-1, i, t, diff
... prev = diff
...
-1 0 (4297074776, 4297074752) 24
35 36 (4297073912, 4297075864) -1952
36 37 (4297075864, 4297075840) 24
76 77 (4297074904, 4297076856) -1952
77 78 (4297076856, 4297076832) 24
117 118 (4297075896, 4297077848) -1952
118 119 (4297077848, 4297077824) 24
158 159 (4297076888, 4297078840) -1952
159 160 (4297078840, 4297078816) 24
199 200 (4297077880, 4297079832) -1952
200 201 (4297079832, 4297079808) 24
240 241 (4297078872, 4297080824) -1952
241 242 (4297080824, 4297080800) 24
256 257 (4297080464, 4297155264) -74800
257 258 (4297155072, 4297155288) -216
259 260 (4297155072, 4297155336) -264
260 261 (4297155048, 4297155432) -384
261 262 (4297155024, 4297155456) -432
262 263 (4297380280, 4297155384) 224896
263 264 (4297155000, 4297155240) -240
264 265 (4297155072, 4297155216) -144
266 267 (4297155072, 4297155168) -96
267 268 (4297155024, 4297155144) -120
Any thoughts, clues, places to look?
Edit: and what's special about 24?
Update: The standard library has sys.getsizeof()
which returns 24
when I call it with 1
as argument. That's a lot of bytes, but on a 64-bit machine, we have 8 bytes each for the type, the value and the ref count. Also, see here, and the C API reference here.
Spent some time with the "source" in the link from Peter Hansen in comments. Couldn't find the definition of an int (beyond a declaration of *int_int
), but I did find:
#define NSMALLPOSINTS 257
#define NSMALLNEGINTS 5
Low-value integers are preallocated, high value integers are allocated whenever they are computed. Integers that appear in source code are the same object. On my system,
>>> id(2) == id(1+1)
True
>>> id(1000) == id(1000+0)
False
>>> id(1000) == id(1000)
True
You'll also notice that the ids depend on the system. They're just memory addresses, assigned by the system allocator (or possibly the linker, for static objects?)
>>> id(0)
8402324
Edit: The reason id(1000) == id(1000)
is because the Python compiler notices that two of the integer constants in the code it's compiling are the same, so it only allocates one object for both. This would be an unacceptable performance hit at runtime, but at compile time it's not noticeable. (Yes, the interpreter is also a compiler. Most interpreters are also compilers, very few aren't.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With