Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is the defaultdict in Python's collections module really faster than using setdefault?

I've seen other Python programmers use defaultdict from the collections module for the following use case:

from collections import defaultdict

s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]

def main():
    d = defaultdict(list)
    for k, v in s:
        d[k].append(v)

I've typically approached this problem by using setdefault instead:

def main():
    d = {}
    for k, v in s:
        d.setdefault(k, []).append(v)

The docs do in fact claim that using defaultdict is faster, but I've seen the opposite to be true when testing myself:

$ python -mtimeit -s "from withsetdefault import main; s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)];" "main()"
100000 loops, best of 3: 4.51 usec per loop
$ python -mtimeit -s "from withdefaultdict import main; s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)];" "main()"
100000 loops, best of 3: 5.38 usec per loop

Is there something wrong with how I've set up the tests?

For reference, I'm using Python 2.7.3 [GCC 4.2.1 (Apple Inc. build 5666)

like image 379
damzam Avatar asked Sep 23 '12 20:09

damzam


1 Answers

Yes, there is something "wrong":

You have put the creation of the (default)dict into the statement instead of the setup. Constructing a new defaultdict is more expensive than a normal dict, and usually that's not the bottleneck you should be profiling in a program - after all, you build your data structures once but you use them many times.

If you do your tests like below, you see that defaultdict operations are indeed faster:

>>> import timeit
>>> setup1 = """from collections import defaultdict
... s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
... d = defaultdict(list)"""
>>> stmt1 = """for k, v in s:
...     d[k].append(v)"""
>>> setup2 = """s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
... d = {}"""
>>> stmt2 = """for k, v in s:
...     d.setdefault(k, []).append(v)"""
>>> timeit.timeit(setup=setup1, stmt=stmt1)
1.0283400125194078
>>> timeit.timeit(setup=setup2, stmt=stmt2)
1.7767367580925395

Python 2.7.3 on Win7 x64.

like image 55
Tim Pietzcker Avatar answered Oct 20 '22 11:10

Tim Pietzcker