What is more efficient in terms of memory and speed between
d[(first,second)]
and
d[first][second]
,
where d
is a dictionary of either tuples or dictionaries?
The fastest way to repeatedly lookup data with millions of entries in Python is using dictionaries. Because dictionaries are the built-in mapping type in Python thereby they are highly optimized.
Dictionaries are Python's implementation of a data structure that is more generally known as an associative array. A dictionary consists of a collection of key-value pairs. Each key-value pair maps the key to its associated value.
Method #1 : Using setdefault() This function is used to define an empty dictionary on 1st nested level of dictionary to make it 2D. In this case there is no need to define explicit dictionaries at that level of dictionary.
As of Python version 3.7, dictionaries are ordered. In Python 3.6 and earlier, dictionaries are unordered. When we say that dictionaries are ordered, it means that the items have a defined order, and that order will not change.
Here is some very basic test data that indicates that for a very contrived example(storing 'a' a million times using numbers as keys) using 2 dictionaries is significantly faster.
$ python -m timeit 'd = {i:{j:"a" for j in range(1000)} for i in range(1000)};a = [d[i][j] for j in range(1000) for i in range(1000)];'
10 loops, best of 3: 316 msec per loop
$ python -m timeit 'd = {(i, j):"a" for j in range(1000) for i in range(1000)};a = [d[i, j] for j in range(1000) for i in range(1000)];'
10 loops, best of 3: 970 msec per loop
Of course, these tests do not necessarily mean anything depending on what you are trying to do. Determine what you'll be storing, and then test.
A little more data:
$ python -m timeit 'a = [(hash(i), hash(j)) for i in range(1000) for j in range(1000)]'
10 loops, best of 3: 304 msec per loop
$ python -m timeit 'a = [hash((i, j)) for i in range(1000) for j in range(1000)]'
10 loops, best of 3: 172 msec per loop
$ python -m timeit 'd = {i:{j:"a" for j in range(1000)} for i in range(1000)}'
10 loops, best of 3: 101 msec per loop
$ python -m timeit 'd = {(i, j):"a" for j in range(1000) for i in range(1000)}'
10 loops, best of 3: 645 msec per loop
Once again this is clearly not indicative of real world use, but it seems to me like the cost of building a dictionary with tuples like that is huge and that's where the dictionary in a dictionary wins out. This surprises me, I was expecting completely different results. I'll have to try a few more things when I have time.
A little surprisingly, the dictionary of dictionaries is faster than the tuple in both CPython 2.7 and Pypy 1.8.
I didn't check on space, but you can do that with ps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With