Answering this question I faced an interesting situation 2 similar code snippets performed quite differently. I'm asking here just to understand the reason for that and to improve my intuition for such cases.
I'll adapt code snippets for Python 2.7 (In Python 3 the performance differences are the same).
from collections import OrderedDict
from operator import itemgetter
from itertools import izip
items = OrderedDict([('a', 10), ('b', 9), ('c', 4), ('d', 7), ('e', 3), ('f', 0), ('g', -5), ('h', 9)])
def f1():
return min(items, key=items.get)
def f2():
return min(items.iteritems(), key=itemgetter(1))[0]
from timeit import Timer
N = 100000
print(Timer(stmt='f1()', setup='from __main__ import f1').timeit(number = N))
print(Timer(stmt='f2()', setup='from __main__ import f2').timeit(number = N))
Output:
0.603327797248
1.21580172899
The first solution has to make lookups in OrderedDictionary
to get value
for each key
. Second solution just iterates through OrderedDictionary
key-value pairs, which have to be packed into tuples.
The second solution is 2 times slower.
Why is that?
I recently watched this video, where Raymond Hettinger says that Python tends to reuse tuples, so no extra allocations.
So, what does this performance issue boil down to?
I want to elaborate a bit on why I'm asking.
The first solution has dictionary lookup. It implies taking key
hash, then finding bin by this hash, then getting key from that bin (hopefully there will be no key collisions), and then getting value associated with that key.
The second solution just goes through all bins and yields all the keys in those bins. It goes through all the bins one-by-one without an overhead of calculation which bin to take. Yes, it has to access values associated with those keys, but the value is only one step from the key, while the first solution has to go through hash-bin-key-value chain to get the value when it needs it. Each solution has to get the value, the first one gets it through hash-bin-key-value chain, the second gets it following one more pointer when accessing key. The only overhead of the second solution is it has to store this value in the tuple temporary along with the key. It turns out that this storing is the major reason for the overhead. Still I don't fully understand why is it so, given the fact there is so-called "tuple reuse" (see the video mentioned above).
To my mind, the second solution has to save value along with the key, but it avoid us of having to make hash-bin-key calculation and access to get value for that key.
The performance differences is mainly caused by OrderedDict
. OrderedDict
uses dict
's get
and __getitem__
, but redefined its own __iter__
and iteritems
.
def __iter__(self):
'od.__iter__() iter(od)'
# Traverse the linked list in order.
root = self.__root
curr = root[1] # start at the first node
while curr is not root:
yield curr[2] # yield the curr[KEY]
curr = curr[1] # move to next node
def iteritems(self):
'od.iteritems -> an iterator over the (key, value) pairs in od'
for k in self:
yield (k, self[k])
Look at what we found: self[k]
.
Your second solution did not help us avoid the hash-bin-key calculation.
While the iterator generated by dict
, more precisely, items.iteritems().next()
if items
is a dict
, does not make that calculation.
Moreover, iteritems
is also more expensive.
from timeit import Timer
N = 1000
d = {i:i for i in range(10000)}
def f1():
for k in d: pass
def f2():
for k in d.iterkeys(): pass
def f3():
for v in d.itervalues(): pass
def f4():
for t in d.iteritems(): pass
print(Timer(stmt='f1()', setup='from __main__ import f1').timeit(number=N))
print(Timer(stmt='f2()', setup='from __main__ import f2').timeit(number=N))
print(Timer(stmt='f3()', setup='from __main__ import f3').timeit(number=N))
print(Timer(stmt='f4()', setup='from __main__ import f4').timeit(number=N))
Output
0.256800375467
0.265079360645
0.260599391822
0.492333103788
Comparing to iterkeys
' dictiter_iternextkey
and itervalues
' dictiter_iternextvalue
, iteritems
'dictiter_iternextitem
has additional parts.
if (result->ob_refcnt == 1) {
Py_INCREF(result);
Py_DECREF(PyTuple_GET_ITEM(result, 0));
Py_DECREF(PyTuple_GET_ITEM(result, 1));
} else {
result = PyTuple_New(2);
if (result == NULL)
return NULL;
}
di->len--;
key = ep[i].me_key;
value = ep[i].me_value;
Py_INCREF(key);
Py_INCREF(value);
PyTuple_SET_ITEM(result, 0, key);
PyTuple_SET_ITEM(result, 1, value);
I think that tuple creation could decrease the performance.
Python indeed tends to reuse tuples.tupleobject.c
shows
/* Speed optimization to avoid frequent malloc/free of small tuples */
#ifndef PyTuple_MAXSAVESIZE
#define PyTuple_MAXSAVESIZE 20 /* Largest tuple to save on free list */
#endif
#ifndef PyTuple_MAXFREELIST
#define PyTuple_MAXFREELIST 2000 /* Maximum number of tuples of each size to save */
#endif
This optimization just means Python does not build some tuples from scratch. But there is still a lot of work to be done.
If OrderedDict
is replaced by dict
, I think the second solution is slightly better in general.
Python dictionaries are implemented using hash tables. So the lookup is fast. The average time complexity of lookup is O(1), while the worst is O(n)1.
The average time complexity of your first solution is the same as the time complexity of your second solution. They are both O(n).
Therefore, the second solution has no advantages or is even slower sometimes, especially when the input data is small.
In that case, the extra cost caused by iteritems
could not be compensated.
from collections import OrderedDict
from operator import itemgetter
from timeit import Timer
from random import randint, random
N = 100000
xs = [('a', 10), ('b', 9), ('c', 4), ('d', 7), ('e', 3), ('f', 0), ('g', -5), ('h', 9)]
od = OrderedDict(xs)
d = dict(xs)
def f1od_min():
return min(od, key=od.get)
def f2od_min():
return min(od.iteritems(), key=itemgetter(1))[0]
def f1d_min():
return min(d, key=d.get)
def f2d_min():
return min(d.iteritems(), key=itemgetter(1))[0]
def f1od():
for k in od: pass
def f2od():
for t in od.iteritems(): pass
def f1d():
for k in d: pass
def f2d():
for t in d.iteritems(): pass
print 'min'
print(Timer(stmt='f1od_min()', setup='from __main__ import f1od_min').timeit(number=N))
print(Timer(stmt='f2od_min()', setup='from __main__ import f2od_min').timeit(number=N))
print(Timer(stmt='f1d_min()', setup='from __main__ import f1d_min').timeit(number=N))
print(Timer(stmt='f2d_min()', setup='from __main__ import f2d_min').timeit(number=N))
print
print 'traverse'
print(Timer(stmt='f1od()', setup='from __main__ import f1od').timeit(number=N))
print(Timer(stmt='f2od()', setup='from __main__ import f2od').timeit(number=N))
print(Timer(stmt='f1d()', setup='from __main__ import f1d').timeit(number=N))
print(Timer(stmt='f2d()', setup='from __main__ import f2d').timeit(number=N))
Output
min
0.398274431527
0.813040903243
0.185168156847
0.249574387248 <-- dict/the second solution
traverse
0.251634216081
0.642283865687
0.0565099754298
0.0958057518483
Then replace N
and xs
by
N = 50
xs = [(x, randint(1, 100)) for x in range(100000)]
Output
min
1.5148923257
3.47020082161
0.712828585756
0.70823812803 <-- dict/the second solution
traverse
0.975989336634
2.92283956481
0.127676073356
0.253622387762
Now replace N
and xs
by
N = 10
xs = [(random(), random()) for x in range(1000000)]
Output
min
6.23311265817
10.702984667
4.32852708934
2.87853889251 <-- dict/the second solution
traverse
2.06231783648
9.49360449443
1.33297618831
1.73723008092
Finally, the second solution starts to shine.
The worse case for the first solution: hash collision
Let
N = 10000
xs = [(2 ** (32 + x) - 2 ** x + 1, 1) for x in range(100)]
# hash(2 ** (32 + x) - 2 ** x + 1) is always 1
Output
min
2.44175265292 <-- lookup is slow
2.76424538594 <-- lookup is slow
2.26508627493 <-- lookup is slow
0.199363955475
traverse
0.200654482623
2.59635966303 <-- lookup is slow
0.0454684184722
0.0733798569371
1 The Average Case times listed for dict objects assume that the hash function for the objects is sufficiently robust to make collisions uncommon. The Average Case assumes the keys used in parameters are selected uniformly at random from the set of all keys. See TimeComplexity.
For tuple reuse, I don't believe it:
>>> a = (1,2)
>>> b = (1,2)
>>> id(a)
139912909456232
>>> id(b)
139912909456304
>>>
You can see from int or string:
>>> a = 1
>>> b = 1
>>> id(a)
34961336
>>> id(b)
34961336
>>>
>>> a = 'a'
>>> b = 'a'
>>> id(a)
139912910202240
>>> id(b)
139912910202240
>>>
edit:
For dict
, your two methods are similar. Let's try:
>>> a = {'a':1, 'b':2, 'c':3}
>>> N = 100000
# really quick to use []
>>> Timer(stmt='for x in a: z = a[x]', setup='from __main__ import a').timeit(number=N)
0.0524289608001709
# use get method
>>> Timer(stmt='for x in a: z = a.get(x)', setup='from __main__ import a').timeit(number=N)
0.10028195381164551
# use iterator and []
>>> Timer(stmt='for x in a.iteritems(): z = x[1]', setup='from __main__ import a').timeit(number=N)
0.08019709587097168
# use itemgetter and iterator
>>> b = itemgetter(1)
>>> Timer(stmt='for x in a.iteritems(): z = b(x)', setup='from __main__ import a, b').timeit(number=N)
0.09941697120666504
Though the time may change, but they are accurate in general. Using iteritems
and itemgetter
is as quick as get
.
But for OrderedDict
, let's try again:
>>> a
OrderedDict([('a', 1), ('c', 3), ('b', 2)])
>>> N = 100000
#Use []
>>> Timer(stmt='for x in a: z = a[x]', setup='from __main__ import a').timeit(number=N)
0.2354598045349121
#Use get
>>> Timer(stmt='for x in a: z = a.get(x)', setup='from __main__ import a').timeit(number=N)
0.21950387954711914
#Use iterator
>>> Timer(stmt='for x in a.iteritems(): z = x[1]', setup='from __main__ import a').timeit(number=N)
0.29949188232421875
#Use iterator and itemgetter
>>> b = itemgetter(1)
>>> Timer(stmt='for x in a.iteritems(): z = b(x)', setup='from __main__ import a, b').timeit(number=N)
0.32039499282836914
You can see that, for OrderedDict
, Use get
and the one use iterator
and itemgetter
vary in time.
So, I think the time difference is because the implementation of OrderedDict
. But sorry I don't know why.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With