Understanding memory allocation for large integers in Python

Tags:

How does Python allocate memory for large integers?

An int type has a size of 28 bytes and as I keep increasing the value of the int, the size increases in increments of 4 bytes.

Why 28 bytes initially for any value as low as 1?
Why increments of 4 bytes?

PS: I am running Python 3.5.2 on a x86_64 (64 bit machine). Any pointers/resources/PEPs on how the (3.0+) interpreters work on such huge numbers is what I am looking for.

Code illustrating the sizes:

>>> a=1
>>> print(a.__sizeof__())
28
>>> a=1024
>>> print(a.__sizeof__())
28
>>> a=1024*1024*1024
>>> print(a.__sizeof__())
32
>>> a=1024*1024*1024*1024
>>> print(a.__sizeof__())
32
>>> a=1024*1024*1024*1024*1024*1024
>>> a
1152921504606846976
>>> print(a.__sizeof__())
36

290

asked Oct 31 '16 14:10

Vigneshwaren

Video Answer

2 Answers

Why 28 bytes initially for any value as low as 1?

I believe @bgusach answered that completely; Python uses C structs to represent objects in the Python world, any objects including ints:

struct _longobject {
    PyObject_VAR_HEAD
    digit ob_digit[1];
};

PyObject_VAR_HEAD is a macro that when expanded adds another field in the struct (field PyVarObject which is specifically used for objects that have some notion of length) and, ob_digits is an array holding the value for the number. Boiler-plate in size comes from that struct, for small and large Python numbers.

Why increments of 4 bytes?

Because, when a larger number is created, the size (in bytes) is a multiple of the sizeof(digit); you can see that in _PyLong_New where the allocation of memory for a new longobject is performed with PyObject_MALLOC:

/* Number of bytes needed is: offsetof(PyLongObject, ob_digit) +
   sizeof(digit)*size.  Previous incarnations of this code used
   sizeof(PyVarObject) instead of the offsetof, but this risks being
   incorrect in the presence of padding between the PyVarObject header
   and the digits. */
if (size > (Py_ssize_t)MAX_LONG_DIGITS) {
    PyErr_SetString(PyExc_OverflowError,
                    "too many digits in integer");
    return NULL;
}
result = PyObject_MALLOC(offsetof(PyLongObject, ob_digit) +
                         size*sizeof(digit));

^{offsetof(PyLongObject, ob_digit) is the 'boiler-plate' (in bytes) for the long object that isn't related with holding its value.}

digit is defined in the header file holding the struct _longobject as a typedef for uint32:

typedef uint32_t digit;

and sizeof(uint32_t) is 4 bytes. That's the amount by which you'll see the size in bytes increase when the size argument to _PyLong_New increases.

Of course, this is just how CPython has chosen to implement it. It is an implementation detail and as such you wont find much information in PEPs. The python-dev mailing list would hold implementation discussions if you can find the corresponding thread :-).

Either way, you might find differing behavior in other popular implementations, so don't take this one for granted.

152

answered Oct 19 '22 17:10

Dimitris Fasarakis Hilliard

It's actually easy. Python's int is not the kind of primitive you may be used to from other languages, but a full fledged object, with its methods and all the stuff. That is where the overhead comes from.

Then, you have the payload itself, the integer that is being represented. And there is no limit for that, except your memory.

The size of a Python's int is what it needs to represent the number plus a little overhead.

If you want to read further, take a look at the relevant part of the documentation:

Integers have unlimited precision

answered Oct 19 '22 17:10

bgusach

Related questions
                            
                                How to run a single line or selected code in a Jupyter Notebook or JupyterLab cell?
                            
                                Using absolute unix paths in windows with python
                            
                                Why isn't SQLAlchemy default column value available before object is committed?
                            
                                How to convert ndarray to array?
                            
                                functools.partial wants to use a positional argument as a keyword argument
                            
                                Python Asynchronous Comprehensions - how do they work?
                            
                                Create large random boolean matrix with numpy
                            
                                Improving the performance of pandas groupby
                            
                                converty numpy array of arrays to 2d array
                            
                                PyCharm venv failed: 'no such option: --build-dir'
                            
                                Python distutils, how to get a compiler that is going to be used?
                            
                                SQLAlchemy: create an intentionally empty query?
                            
                                range(len(list)) or enumerate(list)? [duplicate]
                            
                                raw_id_fields: How to show a name instead of id?
                            
                                How can INFO and DEBUG logging message be sent to stdout and higher level message to stderr
                            
                                call list of function using list comprehension
                            
                                Set "in" operator: uses equality or identity?
                            
                                Python - 'import' or pass modules as parameters?
                            
                                Difference between import numpy and import numpy as np
                            
                                TypeError: b'1' is not JSON serializable

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Understanding memory allocation for large integers in Python

Tags:

python

int

python-3.x

python-internals