Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does Python2.7 dict use more space than Python3 dict?

I've read about Raymond Hettinger's new method of implementing compact dicts. This explains why dicts in Python 3.6 use less memory than dicts in Python 2.7-3.5. However there seems to be a difference between the memory used in Python 2.7 and 3.3-3.5 dicts. Test code:

import sys

d = {i: i for i in range(n)}
print(sys.getsizeof(d))
  • Python 2.7: 12568
  • Python 3.5: 6240
  • Python 3.6: 4704

As mentioned I understand the savings between 3.5 and 3.6 but am curious about the cause of the savings between 2.7 and 3.5.

like image 799
Alex Avatar asked Jul 24 '17 15:07

Alex


People also ask

How much space does a dictionary take Python?

On a 32-bit machine this makes it 12 bytes and on a 64-bit machine, 24 bytes.

Do dictionaries take up more space than lists?

Dictionary occupies much more space than a list of tuples. Even an empty dict occupies much space as compared to a list of tuples.

Should I use dict () or {}?

With CPython 2.7, using dict() to create dictionaries takes up to 6 times longer and involves more memory allocation operations than the literal syntax. Use {} to create dictionaries, especially if you are pre-populating them, unless the literal syntax does not work for your case.

Why use a dictionary instead of a list?

It is more efficient to use dictionaries for the lookup of elements as it is faster than a list and takes less time to traverse. Moreover, lists keep the order of the elements while dictionary does not.

Is it bad to have a dictionary in Python?

Not bad; given how often dictionaries are used in Python, it’s good to know that they don’t normally consume that much memory. What if I add something to the dict?

What are dictionaries in Python?

Dictionaries are Python’s implementation of a data structure that is more generally known as an associative array. A dictionary consists of a collection of key-value pairs. Each key-value pair maps the key to its associated value. You can define a dictionary by enclosing a comma-separated list of key-value pairs in curly braces ( {} ).

How much memory does a Python dictionary use?

We can find out with “ sys.getsizeof “: In other words, our dictionary, with nothing in it at all, consumes 240 bytes. Not bad; given how often dictionaries are used in Python, it’s good to know that they don’t normally consume that much memory.

What is the argument to Dict () in Python?

The argument to dict () should be a sequence of key-value pairs. A list of tuples works well for this: >>> MLB_team = dict( [ ... ('Colorado', 'Rockies'), ... ('Boston', 'Red Sox'), ... ('Minnesota', 'Twins'), ... ('Milwaukee', 'Brewers'), ... ('Seattle', 'Mariners') ... ])


1 Answers

Turns out this is a red herring. The rules for increasing the size of dicts changed between cPython 2.7 - 3.2 and cPython 3.3 and again at cPython 3.4 (though this change only applies when deletions occur). We can see this using the following code to determine when the dict expands:

import sys

size_old = 0
for n in range(512):
    d = {i: i for i in range(n)}
    size = sys.getsizeof(d)
    if size != size_old:
        print(n, size_old, size)
    size_old = size

Python 2.7:

(0, 0, 280)
(6, 280, 1048)
(22, 1048, 3352)
(86, 3352, 12568)

Python 3.5

0 0 288
6 288 480
12 480 864
22 864 1632
44 1632 3168
86 3168 6240

Python 3.6:

0 0 240
6 240 368
11 368 648
22 648 1184
43 1184 2280
86 2280 4704

Keeping in mind that dicts resize when they get to be 2/3 full, we can see that the cPython 2.7 dict implementation quadruples in size when it expands while the cPython 3.5/3.6 dict implementations only double in size.

This is explained in a comment in the dict source code:

/* GROWTH_RATE. Growth rate upon hitting maximum load.
 * Currently set to used*2 + capacity/2.
 * This means that dicts double in size when growing without deletions,
 * but have more head room when the number of deletions is on a par with the
 * number of insertions.
 * Raising this to used*4 doubles memory consumption depending on the size of
 * the dictionary, but results in half the number of resizes, less effort to
 * resize.
 * GROWTH_RATE was set to used*4 up to version 3.2.
 * GROWTH_RATE was set to used*2 in version 3.3.0
 */
like image 190
Alex Avatar answered Oct 13 '22 18:10

Alex