Say I have a class A:
class A(object):
def __init__(self, x):
self.x = x
def __str__(self):
return self.x
And I use sys.getsizeof
to see how many bytes instance of A
takes:
>>> sys.getsizeof(A(1))
64
>>> sys.getsizeof(A('a'))
64
>>> sys.getsizeof(A('aaa'))
64
As illustrated in the experiment above, the size of an A
object is the same no matter what self.x
is.
So I wonder how python store an object internally?
It depends on what kind of object, and also which Python implementation :-)
In CPython, which is what most people use when they use python
, all Python objects are represented by a C struct, PyObject
. Everything that 'stores an object' really stores a PyObject *
. The PyObject
struct holds the bare minimum information: the object's type (a pointer to another PyObject
) and its reference count (an ssize_t
-sized integer.) Types defined in C extend this struct with extra information they need to store in the object itself, and sometimes allocate extra data separately.
For example, tuples (implemented as a PyTupleObject
"extending" a PyObject struct) store their length and the PyObject
pointers they contain inside the struct itself (the struct contains a 1-length array in the definition, but the implementation allocates a block of memory of the right size to hold the PyTupleObject
struct plus exactly as many items as the tuple should hold.) The same way, strings (PyStringObject
) store their length, their cached hashvalue, some string-caching ("interning") bookkeeping, and the actual char* of their data. Tuples and strings are thus single blocks of memory.
On the other hand, lists (PyListObject
) store their length, a PyObject **
for their data and another ssize_t
to keep track of how much room they allocated for the data. Because Python stores PyObject
pointers everywhere, you can't grow a PyObject struct once it's allocated -- doing so may require the struct to move, which would mean finding all pointers and updating them. Because a list may need to grow, it has to allocate the data separately from the PyObject struct. Tuples and strings cannot grow, and so they don't need this. Dicts (PyDictObject
) work the same way, although they store the key, the value and the cached hashvalue of the key, instead of just the items. Dict also have some extra overhead to accommodate small dicts and specialized lookup functions.
But these are all types in C, and you can usually see how much memory they would use just by looking at the C source. Instances of classes defined in Python rather than C are not so easy. The simplest case, instances of classic classes, is not so difficult: it's a PyObject
that stores a PyObject *
to its class (which is not the same thing as the type stored in the PyObject
struct already), a PyObject *
to its __dict__
attribute (which holds all other instance attributes) and a PyObject *
to its weakreflist (which is used by the weakref
module, and only initialized if necessary.) The instance's __dict__
is usually unique to the instance, so when calculating the "memory size" of such an instance you usually want to count the size of the attribute dict as well. But it doesn't have to be specific to the instance! __dict__
can be assigned to just fine.
New-style classes complicate manners. Unlike with classic classes, instances of new-style classes are not separate C types, so they do not need to store the object's class separately. They do have room for the __dict__
and weakreflist reference, but unlike classic instances they don't require the __dict__
attribute for arbitrary attributes. if the class (and all its baseclasses) use __slots__
to define a strict set of attributes, and none of those attributes is named __dict__
, the instance does not allow arbitrary attributes and no dict is allocated. On the other hand, attributes defined by __slots__
have to be stored somewhere. This is done by storing the PyObject
pointers for the values of those attributes directly in the PyObject struct, much like is done with types written in C. Each entry in __slots__
will thus take up a PyObject *
, regardless of whether the attribute is set or not.
All that said, the problem remains that since everything in Python is an object and everything that holds an object just holds a reference, it's sometimes very difficult to draw the line between objects. Two objects can refer to the same bit of data. They may hold the only two references to that data. Getting rid of both objects also gets rid of the data. Do they both own the data? Does only one of them, but if so, which one? Or would you say they own half the data, even though getting rid of one object doesn't release half the data? Weakrefs can make this even more complicated: two objects can refer to the same data, but deleting one of the objects may cause the other object to also get rid of its reference to that data, causing the data to be cleaned up after all.
Fortunately the common case is fairly easy to figure out. There are memory debuggers for Python that do a reasonable job at keeping track of these things, like heapy. And as long as your class (and its baseclasses) is reasonably simple, you can make an educated guess at how much memory it would take up -- especially in large numbers. If you really want to know the exact sizes of your datastructures, consult the CPython source; most builtin types are simple structs described in Include/<type>object.h
and implemented in Objects/<type>object.c
. The PyObject struct itself is described in Include/object.h
. Just keep in mind: it's pointers all the way down; those take up room too.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With