Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is the __dict__ of instances so much smaller in size in Python 3?

In Python, dictionaries created for the instances of a class are tiny compared to the dictionaries created containing the same attributes of that class:

import sys

class Foo(object):
    def __init__(self, a, b):
        self.a = a
        self.b = b

f = Foo(20, 30)

When using Python 3.5.2, the following calls to getsizeof produce:

>>> sys.getsizeof(vars(f))  # vars gets obj.__dict__
96 
>>> sys.getsizeof(dict(vars(f))
288

288 - 96 = 192 bytes saved!

Using Python 2.7.12, though, on the other hand, the same calls return:

>>> sys.getsizeof(vars(f))
280
>>> sys.getsizeof(dict(vars(f)))
280

0 bytes saved.

In both cases, the dictionaries obviously have exactly the same contents:

>>> vars(f) == dict(vars(f))
True

so this isn't a factor. Also, this also applies to Python 3 only.

So, what's going on here? Why is the size of the __dict__ of an instance so tiny in Python 3?

like image 933
Dimitris Fasarakis Hilliard Avatar asked Feb 23 '17 14:02

Dimitris Fasarakis Hilliard


People also ask

Do Python dictionaries have a size limit?

It will not display the output because the computer ran out of memory before reaching 2^27. So there is no size limitation in the dictionary.

What does __ dict __ mean in Python?

The __dict__ in Python represents a dictionary or any mapping object that is used to store the attributes of the object. They are also known as mappingproxy objects. To put it simply, every object in Python has an attribute that is denoted by __dict__.

Why dict is faster than list in Python?

Lookups are faster in dictionaries because Python implements them using hash tables. If we explain the difference by Big O concepts, dictionaries have constant time complexity, O(1) while lists have linear time complexity, O(n).

What is the size of a dictionary in Python?

Dictionary size can mean its length, or space it occupies in memory. To find the number of elements stored in a dictionary we can use the len() function. To find the size of a dictionary in bytes we can use the getsizeof() function of the sys module.

What is ‘__Dict__’ in Python?

What is ‘ __dict__ ’ in Python? Python uses a special built-in attribute __dict__ to store object’s mutable attributes. Basically ‘__dict__’ is a dictionary with a key/value pair of object’s attributes. The ‘ __dict__ ’ attribute is not just limited to instances but can also be available to user-defined functions, modules, ...

What is shared space in Python dictionary?

Starting in Python 3.3, the shared space is used to store keys in the dictionary for all instances of the class. This reduces the size of the instance trace in RAM: It is easy to see that the size of the instance in RAM is still large due to the size of the dictionary of the instance.

What is __Dict__ and __slots__ in Python?

__slots__ is created on the class level, which means if we print ArticleWithSlots.__dict__, we should be able to see it. Besides, we also see 2 extra attributes on the class level, date: <member 'date' ..> and writer: <member 'writer' ..>, which belong to class member_descriptor. What is a descriptor in Python?

Can Python reduce the amount of memory used by objects?

On a clear and simple example, it was possible to verify that the Python programming language (CPython) community of developers and users has real possibilities for a significant reduction in the amount of memory used by objects. — that’s an average salary for all IT specializations based on 8,630 questionnaires for the 2nd half of 2021.


1 Answers

In short:

Instance __dict__'s are implemented differently than the 'normal' dictionaries created with dict or {}. The dictionaries of an instance share the keys and hashes and the keep a separate array for the parts that differ: the values. sys.getsizeof only counts those values when calculating the size for the instance dict.

A bit more:

Dictionaries in CPython are, as of Python 3.3, implemented in one of two forms:

  • Combined dictionary: All values of the dictionary are stored alongside the key and hash for each entry. (me_value member of the PyDictKeyEntry struct). As far as I know, this form is used for dictionaries created with dict, {} and the module namespace.
  • Split table: The values are stored separately in an array, while the keys and hashes are shared (Values stored in ma_values of PyDictObject)

Instance dictionaries are always implemented in a split-table form (a Key-Sharing Dictionary) which allows instances of a given class to share the keys (and hashes) for their __dict__ and only differ in the corresponding values.

This is all described in PEP 412 -- Key-Sharing Dictionary. The implementation for the split dictionary landed in Python 3.3 so, previous versions of the 3 family as well as Python 2.x don't have this implementation.

The implementation of __sizeof__ for dictionaries takes this fact into account and only considers the size that corresponds to the values array when calculating the size for a split dictionary.

It's thankfully, self-explanatory:

Py_ssize_t size, res;

size = DK_SIZE(mp->ma_keys);
res = _PyObject_SIZE(Py_TYPE(mp));
if (mp->ma_values)                    /*Add the values to the result*/
    res += size * sizeof(PyObject*);
/* If the dictionary is split, the keys portion is accounted-for
   in the type object. */
if (mp->ma_keys->dk_refcnt == 1)     /* Add keys/hashes size to res */
    res += sizeof(PyDictKeysObject) + (size-1) * sizeof(PyDictKeyEntry);
return res;

As far as I know, split-table dictionaries are created only for the namespace of instances, using dict() or {} (as also described in the PEP) always results in a combined dictionary that doesn't have these benefits.


As an aside, since it's fun, we can always break this optimization. There's two current ways I've currently found, a silly way or by a more sensible scenario:

  1. Being silly:

    >>> f = Foo(20, 30)
    >>> getsizeof(vars(f))
    96
    >>> vars(f).update({1:1})  # add a non-string key
    >>> getsizeof(vars(f))
    288
    

    Split tables only support string keys, adding a non-string key (which really makes zero sense) breaks this rule and CPython turns the split table into a combined one loosing all memory gains.

  2. A scenario that might happen:

    >>> f1, f2 = Foo(20, 30), Foo(30, 40)
    >>> for i, j in enumerate([f1, f2]):
    ...    setattr(j, 'i'+str(i), i)
    ...    print(getsizeof(vars(j)))
    96
    288
    

    Different keys being inserted in the instances of a class will eventually lead to the split table getting combined. This doesn't apply only to the instances already created; all consequent instances created from the class will be have a combined dictionary instead of a split one.

    # after running previous snippet
    >>> getsizeof(vars(Foo(100, 200)))
    288
    

of course, there's no good reason, other than for fun, for doing this on purpose.


If anyone is wondering, Python 3.6's dictionary implementation doesn't change this fact. The two aforementioned forms of dictionaries while still available are just further compacted (the implementation of dict.__sizeof__ also changed, so some differences should come up in values returned from getsizeof.)

like image 79
Dimitris Fasarakis Hilliard Avatar answered Oct 20 '22 05:10

Dimitris Fasarakis Hilliard