Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In what structure is a Python object stored in memory?

Say I have a class A:

class A(object):
    def __init__(self, x):
        self.x = x

    def __str__(self):
        return self.x

And I use sys.getsizeof to see how many bytes instance of A takes:

>>> sys.getsizeof(A(1))
64
>>> sys.getsizeof(A('a'))
64
>>> sys.getsizeof(A('aaa'))
64

As illustrated in the experiment above, the size of an A object is the same no matter what self.x is.

So I wonder how python store an object internally?

like image 892
satoru Avatar asked Oct 31 '10 10:10

satoru


1 Answers

It depends on what kind of object, and also which Python implementation :-)

In CPython, which is what most people use when they use python, all Python objects are represented by a C struct, PyObject. Everything that 'stores an object' really stores a PyObject *. The PyObject struct holds the bare minimum information: the object's type (a pointer to another PyObject) and its reference count (an ssize_t-sized integer.) Types defined in C extend this struct with extra information they need to store in the object itself, and sometimes allocate extra data separately.

For example, tuples (implemented as a PyTupleObject "extending" a PyObject struct) store their length and the PyObject pointers they contain inside the struct itself (the struct contains a 1-length array in the definition, but the implementation allocates a block of memory of the right size to hold the PyTupleObject struct plus exactly as many items as the tuple should hold.) The same way, strings (PyStringObject) store their length, their cached hashvalue, some string-caching ("interning") bookkeeping, and the actual char* of their data. Tuples and strings are thus single blocks of memory.

On the other hand, lists (PyListObject) store their length, a PyObject ** for their data and another ssize_t to keep track of how much room they allocated for the data. Because Python stores PyObject pointers everywhere, you can't grow a PyObject struct once it's allocated -- doing so may require the struct to move, which would mean finding all pointers and updating them. Because a list may need to grow, it has to allocate the data separately from the PyObject struct. Tuples and strings cannot grow, and so they don't need this. Dicts (PyDictObject) work the same way, although they store the key, the value and the cached hashvalue of the key, instead of just the items. Dict also have some extra overhead to accommodate small dicts and specialized lookup functions.

But these are all types in C, and you can usually see how much memory they would use just by looking at the C source. Instances of classes defined in Python rather than C are not so easy. The simplest case, instances of classic classes, is not so difficult: it's a PyObject that stores a PyObject * to its class (which is not the same thing as the type stored in the PyObject struct already), a PyObject * to its __dict__ attribute (which holds all other instance attributes) and a PyObject * to its weakreflist (which is used by the weakref module, and only initialized if necessary.) The instance's __dict__ is usually unique to the instance, so when calculating the "memory size" of such an instance you usually want to count the size of the attribute dict as well. But it doesn't have to be specific to the instance! __dict__ can be assigned to just fine.

New-style classes complicate manners. Unlike with classic classes, instances of new-style classes are not separate C types, so they do not need to store the object's class separately. They do have room for the __dict__ and weakreflist reference, but unlike classic instances they don't require the __dict__ attribute for arbitrary attributes. if the class (and all its baseclasses) use __slots__ to define a strict set of attributes, and none of those attributes is named __dict__, the instance does not allow arbitrary attributes and no dict is allocated. On the other hand, attributes defined by __slots__ have to be stored somewhere. This is done by storing the PyObject pointers for the values of those attributes directly in the PyObject struct, much like is done with types written in C. Each entry in __slots__ will thus take up a PyObject *, regardless of whether the attribute is set or not.

All that said, the problem remains that since everything in Python is an object and everything that holds an object just holds a reference, it's sometimes very difficult to draw the line between objects. Two objects can refer to the same bit of data. They may hold the only two references to that data. Getting rid of both objects also gets rid of the data. Do they both own the data? Does only one of them, but if so, which one? Or would you say they own half the data, even though getting rid of one object doesn't release half the data? Weakrefs can make this even more complicated: two objects can refer to the same data, but deleting one of the objects may cause the other object to also get rid of its reference to that data, causing the data to be cleaned up after all.

Fortunately the common case is fairly easy to figure out. There are memory debuggers for Python that do a reasonable job at keeping track of these things, like heapy. And as long as your class (and its baseclasses) is reasonably simple, you can make an educated guess at how much memory it would take up -- especially in large numbers. If you really want to know the exact sizes of your datastructures, consult the CPython source; most builtin types are simple structs described in Include/<type>object.h and implemented in Objects/<type>object.c. The PyObject struct itself is described in Include/object.h. Just keep in mind: it's pointers all the way down; those take up room too.

like image 81
Thomas Wouters Avatar answered Nov 03 '22 02:11

Thomas Wouters