This may be trivial, but I'm not sure I understand, I tried googling around but did not find a convincing answer. <pre class="prettyprint"><code>>>> sys.getsizeof({}) 140 >>> sys.getsizeof({'Hello':'World'}) 140 >>> >>> yet_another_dict = {} >>> for i in xrange(5000): yet_another_dict[i] = i**2 >>> >>> sys.getsizeof(yet_another_dict) 98444 </code></pre> How do I understand this? Why is an empty dict the same size as that of a non empty dict?

There are two reasons for that: <ol> <li>Dictionary only holds references to the objects, not the objects themselves, so it's size is no correlated with the size of objects it contains, but with by the number of references (items) the dictionary contains.</li> <li>More important, dictionary preallocates memory for the references in chunks. So when you created a dictionary it already preallocates the memory for the first <code>n</code> references. When it fills up the memory it preallocates a new chunk.</li> </ol> You can observe that behaviour, running the next peace of code. <pre class="prettyprint"><code>d = {} size = sys.getsizeof(d) print size i = 0 j = 0 while i < 3: d[j] = j j += 1 new_size = sys.getsizeof(d) if size != new_size: print new_size size = new_size i += 1 </code></pre> Which prints out: <pre class="prettyprint"><code>280 1048 3352 12568 </code></pre> On my machine, but this depends on the architecture (32bit, 64bit).

Dictionaries in CPython allocate a small amount of key space directly in the dictionary object itself (4-8 entries depending on version and compilation options). From <code>dictobject.h</code>: <pre class="prettyprint"><code>/* PyDict_MINSIZE is the minimum size of a dictionary. This many slots are * allocated directly in the dict object (in the ma_smalltable member). * It must be a power of 2, and at least 4. 8 allows dicts with no more * than 5 active entries to live in ma_smalltable (and so avoid an * additional malloc); instrumentation suggested this suffices for the * majority of dicts (consisting mostly of usually-small instance dicts and * usually-small dicts created to pass keyword arguments). */ #ifndef Py_LIMITED_API #define PyDict_MINSIZE 8 </code></pre> <hr> Note that CPython also resizes the dictionary in batches to avoid frequent reallocations for growing dictionaries. From <code>dictobject.c</code>: <pre class="prettyprint"><code>/* If we added a key, we can safely resize. Otherwise just return! * If fill >= 2/3 size, adjust size. Normally, this doubles or * quaduples the size, but it's also possible for the dict to shrink * (if ma_fill is much larger than ma_used, meaning a lot of dict * keys have been * deleted). * * Quadrupling the size improves average dictionary sparseness * (reducing collisions) at the cost of some memory and iteration * speed (which loops over every possible entry). It also halves * the number of expensive resize operations in a growing dictionary. * * Very large dictionaries (over 50K items) use doubling instead. * This may help applications with severe memory constraints. */ if (!(mp->ma_used > n_used && mp->ma_fill*3 >= (mp->ma_mask+1)*2)) return 0; return dictresize(mp, (mp->ma_used > 50000 ? 2 : 4) * mp->ma_used); </code></pre>

Why is the size of an empty dict same as that of a non empty dict in Python?

Tags:

python

dictionary

memory

python-2.7

This may be trivial, but I'm not sure I understand, I tried googling around but did not find a convincing answer.

>>> sys.getsizeof({})
140
>>> sys.getsizeof({'Hello':'World'})
140
>>>
>>> yet_another_dict = {}
>>> for i in xrange(5000):
        yet_another_dict[i] = i**2

>>> 
>>> sys.getsizeof(yet_another_dict)
98444

How do I understand this? Why is an empty dict the same size as that of a non empty dict?

951

asked Sep 01 '13 13:09

ComputerFellow

2 Answers

There are two reasons for that:

Dictionary only holds references to the objects, not the objects themselves, so it's size is no correlated with the size of objects it contains, but with by the number of references (items) the dictionary contains.
More important, dictionary preallocates memory for the references in chunks. So when you created a dictionary it already preallocates the memory for the first n references. When it fills up the memory it preallocates a new chunk.

You can observe that behaviour, running the next peace of code.

d = {}
size = sys.getsizeof(d)
print size
i = 0
j = 0
while i < 3:
    d[j] = j
    j += 1
    new_size = sys.getsizeof(d)
    if size != new_size:
        print new_size
        size = new_size
        i += 1

Which prints out:

On my machine, but this depends on the architecture (32bit, 64bit).

197

answered Oct 23 '22 02:10

Viktor Kerkez

Dictionaries in CPython allocate a small amount of key space directly in the dictionary object itself (4-8 entries depending on version and compilation options). From dictobject.h:

/* PyDict_MINSIZE is the minimum size of a dictionary.  This many slots are
 * allocated directly in the dict object (in the ma_smalltable member).
 * It must be a power of 2, and at least 4.  8 allows dicts with no more
 * than 5 active entries to live in ma_smalltable (and so avoid an
 * additional malloc); instrumentation suggested this suffices for the
 * majority of dicts (consisting mostly of usually-small instance dicts and
 * usually-small dicts created to pass keyword arguments).
 */
#ifndef Py_LIMITED_API
#define PyDict_MINSIZE 8

Note that CPython also resizes the dictionary in batches to avoid frequent reallocations for growing dictionaries. From dictobject.c:

/* If we added a key, we can safely resize.  Otherwise just return!
 * If fill >= 2/3 size, adjust size.  Normally, this doubles or
 * quaduples the size, but it's also possible for the dict to shrink
 * (if ma_fill is much larger than ma_used, meaning a lot of dict
 * keys have been * deleted).
 *
 * Quadrupling the size improves average dictionary sparseness
 * (reducing collisions) at the cost of some memory and iteration
 * speed (which loops over every possible entry).  It also halves
 * the number of expensive resize operations in a growing dictionary.
 *
 * Very large dictionaries (over 50K items) use doubling instead.
 * This may help applications with severe memory constraints.
 */
if (!(mp->ma_used > n_used && mp->ma_fill*3 >= (mp->ma_mask+1)*2))
    return 0;
return dictresize(mp, (mp->ma_used > 50000 ? 2 : 4) * mp->ma_used);

answered Oct 23 '22 03:10

nneonneo

Related questions
                            
                                django admin search with multiple words
                            
                                Running statements in 'parallel'
                            
                                Sending an existing dict through zmq ipc
                            
                                backref class attribute
                            
                                How to schedule weekday-aware jobs in celery
                            
                                Why must c++ code be contained within functions?
                            
                                Extract Text Using PdfMiner and PyPDF2 Merges columns
                            
                                How do you unroll a Numpy array of (mxn) dimentions into a single vector
                            
                                Python list sorting dependant on if items are in another list
                            
                                Django custom user model and usermanager
                            
                                C++ vector to Python 3.3
                            
                                How to make try-except-KeyError shorter in python?
                            
                                how to create datetime from a negative epoch in Python
                            
                                Finding the row with the highest average in a numpy array
                            
                                ValueError usupported format character 'd' with psycopg2
                            
                                Hardware interrupt for synchronous data acquisition
                            
                                Python: xlrd discerning dates from floats
                            
                                pygame.key.get_pressed() is not working
                            
                                Flask-Admin extending templates
                            
                                How to fill rainbow color under a curve in Python matplotlib

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With