Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is the size of an empty dict same as that of a non empty dict in Python?

This may be trivial, but I'm not sure I understand, I tried googling around but did not find a convincing answer.

>>> sys.getsizeof({})
140
>>> sys.getsizeof({'Hello':'World'})
140
>>>
>>> yet_another_dict = {}
>>> for i in xrange(5000):
        yet_another_dict[i] = i**2

>>> 
>>> sys.getsizeof(yet_another_dict)
98444

How do I understand this? Why is an empty dict the same size as that of a non empty dict?

like image 951
ComputerFellow Avatar asked Sep 01 '13 13:09

ComputerFellow


People also ask

What does an empty dictionary mean in Python?

In Python, an empty dictionary means that it does not contain key-value pair elements. In this example to create an empty dictionary, we can use a dict constructor and this method takes no arguments. If no argument is passed then it creates an empty dictionary.

How do you know if a dictionary is not empty?

# Checking if a dictionary is empty by checking its length empty_dict = {} if len(empty_dict) == 0: print('This dictionary is empty! ') else: print('This dictionary is not empty!

Does empty dict return false?

If the dictionary is empty, it returns None which is not == False .

Can dictionaries be empty?

An empty dictionary (or other containers) will evaluate to a Boolean False . You can do that by testing just the dictionary variable or using the bool() function. Both methods are shown in the following code example. Also, it is possible to use the len() function to test whether the dictionary is empty.


2 Answers

There are two reasons for that:

  1. Dictionary only holds references to the objects, not the objects themselves, so it's size is no correlated with the size of objects it contains, but with by the number of references (items) the dictionary contains.

  2. More important, dictionary preallocates memory for the references in chunks. So when you created a dictionary it already preallocates the memory for the first n references. When it fills up the memory it preallocates a new chunk.

You can observe that behaviour, running the next peace of code.

d = {}
size = sys.getsizeof(d)
print size
i = 0
j = 0
while i < 3:
    d[j] = j
    j += 1
    new_size = sys.getsizeof(d)
    if size != new_size:
        print new_size
        size = new_size
        i += 1

Which prints out:

280
1048
3352
12568

On my machine, but this depends on the architecture (32bit, 64bit).

like image 197
Viktor Kerkez Avatar answered Oct 23 '22 02:10

Viktor Kerkez


Dictionaries in CPython allocate a small amount of key space directly in the dictionary object itself (4-8 entries depending on version and compilation options). From dictobject.h:

/* PyDict_MINSIZE is the minimum size of a dictionary.  This many slots are
 * allocated directly in the dict object (in the ma_smalltable member).
 * It must be a power of 2, and at least 4.  8 allows dicts with no more
 * than 5 active entries to live in ma_smalltable (and so avoid an
 * additional malloc); instrumentation suggested this suffices for the
 * majority of dicts (consisting mostly of usually-small instance dicts and
 * usually-small dicts created to pass keyword arguments).
 */
#ifndef Py_LIMITED_API
#define PyDict_MINSIZE 8

Note that CPython also resizes the dictionary in batches to avoid frequent reallocations for growing dictionaries. From dictobject.c:

/* If we added a key, we can safely resize.  Otherwise just return!
 * If fill >= 2/3 size, adjust size.  Normally, this doubles or
 * quaduples the size, but it's also possible for the dict to shrink
 * (if ma_fill is much larger than ma_used, meaning a lot of dict
 * keys have been * deleted).
 *
 * Quadrupling the size improves average dictionary sparseness
 * (reducing collisions) at the cost of some memory and iteration
 * speed (which loops over every possible entry).  It also halves
 * the number of expensive resize operations in a growing dictionary.
 *
 * Very large dictionaries (over 50K items) use doubling instead.
 * This may help applications with severe memory constraints.
 */
if (!(mp->ma_used > n_used && mp->ma_fill*3 >= (mp->ma_mask+1)*2))
    return 0;
return dictresize(mp, (mp->ma_used > 50000 ? 2 : 4) * mp->ma_used);
like image 31
nneonneo Avatar answered Oct 23 '22 03:10

nneonneo