This may be trivial, but I'm not sure I understand, I tried googling around but did not find a convincing answer.
>>> sys.getsizeof({})
140
>>> sys.getsizeof({'Hello':'World'})
140
>>>
>>> yet_another_dict = {}
>>> for i in xrange(5000):
yet_another_dict[i] = i**2
>>>
>>> sys.getsizeof(yet_another_dict)
98444
How do I understand this? Why is an empty dict the same size as that of a non empty dict?
In Python, an empty dictionary means that it does not contain key-value pair elements. In this example to create an empty dictionary, we can use a dict constructor and this method takes no arguments. If no argument is passed then it creates an empty dictionary.
# Checking if a dictionary is empty by checking its length empty_dict = {} if len(empty_dict) == 0: print('This dictionary is empty! ') else: print('This dictionary is not empty!
If the dictionary is empty, it returns None which is not == False .
An empty dictionary (or other containers) will evaluate to a Boolean False . You can do that by testing just the dictionary variable or using the bool() function. Both methods are shown in the following code example. Also, it is possible to use the len() function to test whether the dictionary is empty.
There are two reasons for that:
Dictionary only holds references to the objects, not the objects themselves, so it's size is no correlated with the size of objects it contains, but with by the number of references (items) the dictionary contains.
More important, dictionary preallocates memory for the references in chunks. So when you created a dictionary it already preallocates the memory for the first n
references. When it fills up the memory it preallocates a new chunk.
You can observe that behaviour, running the next peace of code.
d = {}
size = sys.getsizeof(d)
print size
i = 0
j = 0
while i < 3:
d[j] = j
j += 1
new_size = sys.getsizeof(d)
if size != new_size:
print new_size
size = new_size
i += 1
Which prints out:
280
1048
3352
12568
On my machine, but this depends on the architecture (32bit, 64bit).
Dictionaries in CPython allocate a small amount of key space directly in the dictionary object itself (4-8 entries depending on version and compilation options). From dictobject.h
:
/* PyDict_MINSIZE is the minimum size of a dictionary. This many slots are
* allocated directly in the dict object (in the ma_smalltable member).
* It must be a power of 2, and at least 4. 8 allows dicts with no more
* than 5 active entries to live in ma_smalltable (and so avoid an
* additional malloc); instrumentation suggested this suffices for the
* majority of dicts (consisting mostly of usually-small instance dicts and
* usually-small dicts created to pass keyword arguments).
*/
#ifndef Py_LIMITED_API
#define PyDict_MINSIZE 8
Note that CPython also resizes the dictionary in batches to avoid frequent reallocations for growing dictionaries. From dictobject.c
:
/* If we added a key, we can safely resize. Otherwise just return!
* If fill >= 2/3 size, adjust size. Normally, this doubles or
* quaduples the size, but it's also possible for the dict to shrink
* (if ma_fill is much larger than ma_used, meaning a lot of dict
* keys have been * deleted).
*
* Quadrupling the size improves average dictionary sparseness
* (reducing collisions) at the cost of some memory and iteration
* speed (which loops over every possible entry). It also halves
* the number of expensive resize operations in a growing dictionary.
*
* Very large dictionaries (over 50K items) use doubling instead.
* This may help applications with severe memory constraints.
*/
if (!(mp->ma_used > n_used && mp->ma_fill*3 >= (mp->ma_mask+1)*2))
return 0;
return dictresize(mp, (mp->ma_used > 50000 ? 2 : 4) * mp->ma_used);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With