Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Two-dimensional vs. One-dimensional dictionary efficiency in Python

What is more efficient in terms of memory and speed between

d[(first,second)]

and

d[first][second],

where d is a dictionary of either tuples or dictionaries?

like image 691
Zach Avatar asked Apr 16 '12 22:04

Zach


People also ask

Is dictionary efficient in Python?

The fastest way to repeatedly lookup data with millions of entries in Python is using dictionaries. Because dictionaries are the built-in mapping type in Python thereby they are highly optimized.

What is true about dictionary in Python?

Dictionaries are Python's implementation of a data structure that is more generally known as an associative array. A dictionary consists of a collection of key-value pairs. Each key-value pair maps the key to its associated value.

How do you create a multidimensional dictionary in Python?

Method #1 : Using setdefault() This function is used to define an empty dictionary on 1st nested level of dictionary to make it 2D. In this case there is no need to define explicit dictionaries at that level of dictionary.

Is dictionary unordered in Python?

As of Python version 3.7, dictionaries are ordered. In Python 3.6 and earlier, dictionaries are unordered. When we say that dictionaries are ordered, it means that the items have a defined order, and that order will not change.


2 Answers

Here is some very basic test data that indicates that for a very contrived example(storing 'a' a million times using numbers as keys) using 2 dictionaries is significantly faster.

$ python -m timeit 'd = {i:{j:"a" for j in range(1000)} for i in range(1000)};a = [d[i][j] for j in range(1000) for i in range(1000)];'
10 loops, best of 3: 316 msec per loop
$ python -m timeit 'd = {(i, j):"a" for j in range(1000) for i in range(1000)};a = [d[i, j] for j in range(1000) for i in range(1000)];'
10 loops, best of 3: 970 msec per loop

Of course, these tests do not necessarily mean anything depending on what you are trying to do. Determine what you'll be storing, and then test.

A little more data:

$ python -m timeit 'a = [(hash(i), hash(j)) for i in range(1000) for j in range(1000)]'
10 loops, best of 3: 304 msec per loop
$ python -m timeit 'a = [hash((i, j)) for i in range(1000) for j in range(1000)]'
10 loops, best of 3: 172 msec per loop
$ python -m timeit 'd = {i:{j:"a" for j in range(1000)} for i in range(1000)}'
10 loops, best of 3: 101 msec per loop
$ python -m timeit 'd = {(i, j):"a" for j in range(1000) for i in range(1000)}'
10 loops, best of 3: 645 msec per loop

Once again this is clearly not indicative of real world use, but it seems to me like the cost of building a dictionary with tuples like that is huge and that's where the dictionary in a dictionary wins out. This surprises me, I was expecting completely different results. I'll have to try a few more things when I have time.

like image 57
Nolen Royalty Avatar answered Oct 24 '22 10:10

Nolen Royalty


A little surprisingly, the dictionary of dictionaries is faster than the tuple in both CPython 2.7 and Pypy 1.8.

I didn't check on space, but you can do that with ps.

like image 37
user1277476 Avatar answered Oct 24 '22 09:10

user1277476