I have a script that makes many calls to a dictionary using a key consisting of two variables. I know that my program will encounter the two variables again in the reverse order which makes storing the key as a tuple feasible. (Creating a matrix with the same labels for rows and columns)
Therefore, I was wondering if there was a performance difference in using a tuple over a frozenset for a dictionary key.
It is well-known that in Python tuples are faster than lists, and dicts are faster than objects.
tuples are immutable lists , frozensets are immutable sets . frozensets aren't indexed, but you have the functionality of sets - O(1) element lookups, and functionality such as unions and intersections. They also can't contain duplicates, like their mutable counterparts.
A tuple containing a list cannot be used as a key in a dictionary. Answer: True. A list is mutable. Therefore, a tuple containing a list cannot be used as a key in a dictionary.
The frozenset type is immutable and hashable — its contents cannot be altered after it is created; it can therefore be used as a dictionary key or as an element of another set.
In a quick test, apparently it makes a negligible difference.
python -m timeit -s "keys = list(zip(range(10000), range(10, 10000)))" -s "values = range(10000)" -s "a=dict(zip(keys, values))" "for i in keys:" " _ = a[i]"
1000 loops, best of 3: 855 usec per loop
python -m timeit -s "keys = [frozenset(i) for i in zip(range(10000), range(10, 10000))]" -s "values = range(10000)" -s "a=dict(zip(keys, values))" "for i in keys:" " _ = a[i]"
1000 loops, best of 3: 848 usec per loop
I really would just go with what is best elsewhere in your code.
Without having done any tests, I have a few guesses. For frozenset
s, cpython stores the hash after it has been calculated; furthermore, iterating over a set of any kind incurs extra overhead because the data is stored sparsely. In a 2-item set, that imposes a significant performance penalty on the first hash, but would probably make the second hash very fast -- at least when the object itself is the same. (i.e. is not a new but equivalent frozenset.)
For tuple
s, cpython does not store the hash, but rather calculates it every time. So it might be that repeated hashing is slightly cheaper with frozensets. But for such a short tuple, there's probably almost no difference; it's even possible that very short tuples will be faster.
Lattyware's current timings line up reasonably well with my line of reasoning here; see below.
To test my intuition about the asymmetry of hashing new vs. old frozensets, I did the following. I believe the difference in timings is exclusively due to the extra hash time. Which is pretty insignificant, by the way:
>>> fs = frozenset((1, 2))
>>> old_fs = lambda: [frozenset((1, 2)), fs][1]
>>> new_fs = lambda: [frozenset((1, 2)), fs][0]
>>> id(fs) == id(old_fs())
True
>>> id(fs) == id(new_fs())
False
>>> %timeit hash(old_fs())
1000000 loops, best of 3: 642 ns per loop
>>> %timeit hash(new_fs())
1000000 loops, best of 3: 660 ns per loop
Note that my previous timings were wrong; using and
created a timing asymmetry that the above method avoids. This new method produces expected results for tuples here -- negligable timing difference:
>>> tp = (1, 2)
>>> old_tp = lambda: [tuple((1, 2)), tp][1]
>>> new_tp = lambda: [tuple((1, 2)), tp][0]
>>> id(tp) == id(old_tp())
True
>>> id(tp) == id(new_tp())
False
>>> %timeit hash(old_tp())
1000000 loops, best of 3: 533 ns per loop
>>> %timeit hash(new_tp())
1000000 loops, best of 3: 532 ns per loop
And, the coup de grace, comparing hash time for a pre-constructred frozenset to hash time for a pre-constructed tuple:
>>> %timeit hash(fs)
10000000 loops, best of 3: 82.2 ns per loop
>>> %timeit hash(tp)
10000000 loops, best of 3: 93.6 ns per loop
Lattyware's results look more like this because they are an average of results for new and old frozensets. (They hash each tuple or frozenset twice, once in creating the dictionary, once in accessing it.)
The upshot of all this is that it probably doesn't matter, except to those of us who enjoy digging around in Python's internals and testing things into oblivion.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With