Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a performance difference in using a tuple over a frozenset as a key for a dictionary?

Tags:

python

I have a script that makes many calls to a dictionary using a key consisting of two variables. I know that my program will encounter the two variables again in the reverse order which makes storing the key as a tuple feasible. (Creating a matrix with the same labels for rows and columns)

Therefore, I was wondering if there was a performance difference in using a tuple over a frozenset for a dictionary key.

like image 912
chrtan Avatar asked May 01 '12 13:05

chrtan


People also ask

Which is faster tuple or dictionary?

It is well-known that in Python tuples are faster than lists, and dicts are faster than objects.

What is the difference between tuple and Frozenset?

tuples are immutable lists , frozensets are immutable sets . frozensets aren't indexed, but you have the functionality of sets - O(1) element lookups, and functionality such as unions and intersections. They also can't contain duplicates, like their mutable counterparts.

Can tuple be used as a key to dictionary explain?

A tuple containing a list cannot be used as a key in a dictionary. Answer: True. A list is mutable. Therefore, a tuple containing a list cannot be used as a key in a dictionary.

Can a Frozenset be a key for dictionary?

The frozenset type is immutable and hashable — its contents cannot be altered after it is created; it can therefore be used as a dictionary key or as an element of another set.


2 Answers

In a quick test, apparently it makes a negligible difference.

python -m timeit -s "keys = list(zip(range(10000), range(10, 10000)))" -s "values = range(10000)" -s "a=dict(zip(keys, values))" "for i in keys:" "  _ = a[i]"
1000 loops, best of 3: 855 usec per loop

python -m timeit -s "keys = [frozenset(i) for i in zip(range(10000), range(10, 10000))]" -s "values = range(10000)" -s "a=dict(zip(keys, values))" "for i in keys:" "  _ = a[i]"
1000 loops, best of 3: 848 usec per loop

I really would just go with what is best elsewhere in your code.

like image 79
Gareth Latty Avatar answered Sep 30 '22 15:09

Gareth Latty


Without having done any tests, I have a few guesses. For frozensets, cpython stores the hash after it has been calculated; furthermore, iterating over a set of any kind incurs extra overhead because the data is stored sparsely. In a 2-item set, that imposes a significant performance penalty on the first hash, but would probably make the second hash very fast -- at least when the object itself is the same. (i.e. is not a new but equivalent frozenset.)

For tuples, cpython does not store the hash, but rather calculates it every time. So it might be that repeated hashing is slightly cheaper with frozensets. But for such a short tuple, there's probably almost no difference; it's even possible that very short tuples will be faster.

Lattyware's current timings line up reasonably well with my line of reasoning here; see below.

To test my intuition about the asymmetry of hashing new vs. old frozensets, I did the following. I believe the difference in timings is exclusively due to the extra hash time. Which is pretty insignificant, by the way:

>>> fs = frozenset((1, 2))
>>> old_fs = lambda: [frozenset((1, 2)), fs][1]
>>> new_fs = lambda: [frozenset((1, 2)), fs][0]
>>> id(fs) == id(old_fs())
True
>>> id(fs) == id(new_fs())
False
>>> %timeit hash(old_fs())
1000000 loops, best of 3: 642 ns per loop
>>> %timeit hash(new_fs())
1000000 loops, best of 3: 660 ns per loop

Note that my previous timings were wrong; using and created a timing asymmetry that the above method avoids. This new method produces expected results for tuples here -- negligable timing difference:

>>> tp = (1, 2)
>>> old_tp = lambda: [tuple((1, 2)), tp][1]
>>> new_tp = lambda: [tuple((1, 2)), tp][0]
>>> id(tp) == id(old_tp())
True
>>> id(tp) == id(new_tp())
False
>>> %timeit hash(old_tp())
1000000 loops, best of 3: 533 ns per loop
>>> %timeit hash(new_tp())
1000000 loops, best of 3: 532 ns per loop

And, the coup de grace, comparing hash time for a pre-constructred frozenset to hash time for a pre-constructed tuple:

>>> %timeit hash(fs)
10000000 loops, best of 3: 82.2 ns per loop
>>> %timeit hash(tp)
10000000 loops, best of 3: 93.6 ns per loop

Lattyware's results look more like this because they are an average of results for new and old frozensets. (They hash each tuple or frozenset twice, once in creating the dictionary, once in accessing it.)

The upshot of all this is that it probably doesn't matter, except to those of us who enjoy digging around in Python's internals and testing things into oblivion.

like image 29
senderle Avatar answered Sep 30 '22 15:09

senderle