How does Python allocate memory for large integers?
An int
type has a size of 28 bytes
and as I keep increasing the value of the int
, the size increases in increments of 4 bytes
.
Why 28 bytes
initially for any value as low as 1
?
Why increments of 4 bytes
?
PS: I am running Python 3.5.2 on a x86_64 (64 bit machine). Any pointers/resources/PEPs on how the (3.0+) interpreters work on such huge numbers is what I am looking for.
Code illustrating the sizes:
>>> a=1
>>> print(a.__sizeof__())
28
>>> a=1024
>>> print(a.__sizeof__())
28
>>> a=1024*1024*1024
>>> print(a.__sizeof__())
32
>>> a=1024*1024*1024*1024
>>> print(a.__sizeof__())
32
>>> a=1024*1024*1024*1024*1024*1024
>>> a
1152921504606846976
>>> print(a.__sizeof__())
36
To be safe, Python allocates a fixed number of bytes of space in memory for each variable of a normal integer type, which is known as int in Python. Typically, an integer occupies four bytes, or 32 bits. Integers whose binary representations require fewer than 32 bits are padded to the left with 0s.
Allocating Memory for Integer Variables An integer is a whole number with no fractional part. In assembler, the variables are created by data allocation directives. Assembler declaration of integer variable assigns a label to a memory space allocated for the integer. 77h is initializer specifying initial value.
Memory allocation can be defined as allocating a block of space in the computer memory to a program. In Python memory allocation and deallocation method is automatic as the Python developers created a garbage collector for Python so that the user does not have to do manual garbage collection.
Python optimizes memory utilization by allocating the same object reference to a new variable if the object already exists with the same value. That is why python is called more memory efficient.
Why
28
bytes initially for any value as low as1
?
I believe @bgusach answered that completely; Python uses C
structs to represent objects in the Python world, any objects including int
s:
struct _longobject {
PyObject_VAR_HEAD
digit ob_digit[1];
};
PyObject_VAR_HEAD
is a macro that when expanded adds another field in the struct (field PyVarObject
which is specifically used for objects that have some notion of length) and, ob_digits
is an array holding the value for the number. Boiler-plate in size comes from that struct, for small and large Python numbers.
Why increments of
4
bytes?
Because, when a larger number is created, the size (in bytes) is a multiple of the sizeof(digit)
; you can see that in _PyLong_New
where the allocation of memory for a new longobject
is performed with PyObject_MALLOC
:
/* Number of bytes needed is: offsetof(PyLongObject, ob_digit) +
sizeof(digit)*size. Previous incarnations of this code used
sizeof(PyVarObject) instead of the offsetof, but this risks being
incorrect in the presence of padding between the PyVarObject header
and the digits. */
if (size > (Py_ssize_t)MAX_LONG_DIGITS) {
PyErr_SetString(PyExc_OverflowError,
"too many digits in integer");
return NULL;
}
result = PyObject_MALLOC(offsetof(PyLongObject, ob_digit) +
size*sizeof(digit));
offsetof(PyLongObject, ob_digit)
is the 'boiler-plate' (in bytes) for the long object that isn't related with holding its value.
digit
is defined in the header file holding the struct _longobject
as a typedef
for uint32
:
typedef uint32_t digit;
and sizeof(uint32_t)
is 4
bytes. That's the amount by which you'll see the size in bytes increase when the size
argument to _PyLong_New
increases.
Of course, this is just how C
Python has chosen to implement it. It is an implementation detail and as such you wont find much information in PEPs. The python-dev mailing list would hold implementation discussions if you can find the corresponding thread :-).
Either way, you might find differing behavior in other popular implementations, so don't take this one for granted.
It's actually easy. Python's int
is not the kind of primitive you may be used to from other languages, but a full fledged object, with its methods and all the stuff. That is where the overhead comes from.
Then, you have the payload itself, the integer that is being represented. And there is no limit for that, except your memory.
The size of a Python's int
is what it needs to represent the number plus a little overhead.
If you want to read further, take a look at the relevant part of the documentation:
Integers have unlimited precision
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With