On a 64-bit system an integer in Python takes 24 bytes. This is 3 times the memory that would be needed in e.g. C for a 64-bit integer. Now, I know this is because Python integers are objects. But what is the extra memory used for? I have my guesses, but it would be nice to know for sure.
On a 64-bit system an integer in Python takes 24 bytes. This is 3 times the memory that would be needed in e.g. C for a 64-bit integer.
Getting the size of an integerTo store the number 0, Python uses 24 bytes. Since storing the number zero, Python needs to use only 1 bit. Note that 1 byte equals 8 bits. Therefore, you can think that Python uses 24 bytes as an overhead for storing an integer object.
These represent numbers in the range -2147483648 through 2147483647. (The range may be larger on machines with a larger natural word size, but not smaller.)
Remember that the Python int
type does not have a limited range like C int
has; the only limit is the available memory.
Memory goes to storing the value, the current size of the integer storage (the storage size is variable to support arbitrary sizes), and the standard Python object bookkeeping (a reference to the relevant object and a reference count).
You can look up the longintrepr.h
source (the Python 3 int
type was traditionally known as the long
type in Python 2); it makes effective use of the PyVarObject
C type to track integer size:
struct _longobject {
PyObject_VAR_HEAD
digit ob_digit[1];
};
The ob_digit
array stores 'digits' of either 15 or 30 bits wide (depending on your platform); so on my 64-bit OS X system, an integer up to (2 ^ 30) - 1 uses 1 'digit':
>>> sys.getsizeof((1 << 30) - 1)
28
but if you use 2 30-bit digits in the number an additional 4 bytes are needed, etc:
>>> sys.getsizeof(1 << 30)
32
>>> sys.getsizeof(1 << 60)
36
>>> sys.getsizeof(1 << 90)
40
The base 24 bytes then are the PyObject_VAR_HEAD
structure, holding the object size, the reference count and the type pointer (each 8 bytes / 64 bits on my 64-bit OS X platform).
On Python 2, integers <= sys.maxint
but >= -sys.maxint - 1
are stored using a simpler structure storing just the single value:
typedef struct {
PyObject_HEAD
long ob_ival;
} PyIntObject;
because this uses PyObject
instead of PyVarObject
there is no ob_size
field in the struct and the memory size is limited to just 24 bytes; 8 for the long
value, 8 for the reference count and 8 for the type object pointer.
From longintrepr.h, we see that a Python 'int' object is defined with this C structure:
struct _longobject {
PyObject_VAR_HEAD
digit ob_digit[1];
};
Digit is a 32-bit unsigned value. The bulk of the space is taken by the variable size object header. From object.h, we can find its definition:
typedef struct {
PyObject ob_base;
Py_ssize_t ob_size; /* Number of items in variable part */
} PyVarObject;
typedef struct _object {
_PyObject_HEAD_EXTRA
Py_ssize_t ob_refcnt;
struct _typeobject *ob_type;
} PyObject;
We can see that we are using a Py_ssize_t, 64-bits assuming 64-bit system, to store the count of "digits" in the value. This is possibly wasteful. We can also see that the general object header has a 64-bit reference count, and a pointer to the object type, which will also be a 64-bits of storage. The reference count is necessary for Python to know when to deallocate the object, and the pointer to the object type is necessary to know that we have an int and not, say, a string, as C structures have no way to test the type of an object from an arbitrary pointer.
_PyObject_HEAD_EXTRA is defined to nothing on most builds of python, but can be used to store a linked list of all Python objects on the heap if the build enables that option, using another two pointers of 64-bits each.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With