In python's library, we now have two Python Implementation of dictionaries which subclasses dict
over and above the native dict
type.
Python's advocates have always preferred to defaultdict
over using dict.setdefault
where possible. Even the doc quotes that This technique is simpler and faster than an equivalent technique using dict.setdefault():
In similar ways, as dictionaries does not maintain order, using OrderedDict
over using dict
followed by sorting the items is preferred when ever possible for the alternative usage.
In both the above case, the code is definitely cleaner but at the cost of performance penalty.
While answering and commenting on one of the question python unique list based on item, I stumbled upon the performance penalty over the native dict
when using defaultdict
and OrderedDict
. It also seems the size of the data is also not immaterial to the performance advantage dict
solution has over others.
I believe There should be one-- and preferably only one --obvious way to do it.
, so what is the preferred way?
It depends on the data; setdefault is faster and simpler with small data sets; defaultdict is faster for larger data sets with more homogenous key sets (ie, how short the dict is after adding elements);
The main difference between defaultdict and dict is that when you try to access or modify a key that's not present in the dictionary, a default value is automatically given to that key . In order to provide this functionality, the Python defaultdict type does two things: It overrides .
No it won't become redundant in Python 3.7 because OrderedDict is not just a dict that retains insertion order, it also offers an order dependent method, OrderedDict. move_to_end() , and supports reversed() iteration*.
Intent signaling: If you use OrderedDict over dict , then your code makes it clear that the order of items in the dictionary is important. You're clearly communicating that your code needs or relies on the order of items in the underlying dictionary.
There is not one single answer and not one true and only dict. Among many variables, it depends on:
I am loathe to generalize, but here are some generalities:
This technique is simpler and faster than an equivalent technique using dict.setdefault()
is just flat wrong. It depends on the data;setdefault
is faster and simpler with small data sets;defaultdict
is faster for larger data sets with more homogenous key sets (ie, how short the dict is after adding elements);setdefault
has an advantage with more heterogeneous key sets;OrderedDict
is slower in all cases other than an algorithm that depends on order and order is not easy to reconstruct or sort;dict
operations;OrderedDict
). The only truth: It Depends! All three technique are useful.
Here is some timing code to show:
from __future__ import print_function from collections import defaultdict from collections import OrderedDict try: t=unichr(100) except NameError: unichr=chr def f1(li): '''defaultdict''' d = defaultdict(list) for k, v in li: d[k].append(v) return d.items() def f2(li): '''setdefault''' d={} for k, v in li: d.setdefault(k, []).append(v) return d.items() def f3(li): '''OrderedDict''' d=OrderedDict() for k, v in li: d.setdefault(k, []).append(v) return d.items() if __name__ == '__main__': import timeit import sys print(sys.version) few=[('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)] fmt='{:>12}: {:10.2f} micro sec/call ({:,} elements, {:,} keys)' for tag, m, n in [('small',5,10000), ('medium',20,1000), ('bigger',1000,100), ('large',5000,10)]: for f in [f1,f2,f3]: s = few*m res=timeit.timeit("{}(s)".format(f.__name__), setup="from __main__ import {}, s".format(f.__name__), number=n) st=fmt.format(f.__doc__, res/n*1000000, len(s), len(f(s))) print(st) s = [(unichr(i%0x10000),i) for i in range(1,len(s)+1)] res=timeit.timeit("{}(s)".format(f.__name__), setup="from __main__ import {}, s".format(f.__name__), number=n) st=fmt.format(f.__doc__, res/n*1000000, len(s), len(f(s))) print(st) print()
Python 2.7 result:
2.7.5 (default, Aug 25 2013, 00:04:04) [GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] defaultdict: 10.20 micro sec/call (25 elements, 3 keys) defaultdict: 21.08 micro sec/call (25 elements, 25 keys) setdefault: 13.41 micro sec/call (25 elements, 3 keys) setdefault: 18.24 micro sec/call (25 elements, 25 keys) OrderedDict: 49.47 micro sec/call (25 elements, 3 keys) OrderedDict: 102.16 micro sec/call (25 elements, 25 keys) defaultdict: 28.28 micro sec/call (100 elements, 3 keys) defaultdict: 79.78 micro sec/call (100 elements, 100 keys) setdefault: 45.68 micro sec/call (100 elements, 3 keys) setdefault: 68.66 micro sec/call (100 elements, 100 keys) OrderedDict: 117.78 micro sec/call (100 elements, 3 keys) OrderedDict: 343.17 micro sec/call (100 elements, 100 keys) defaultdict: 1123.60 micro sec/call (5,000 elements, 3 keys) defaultdict: 4250.44 micro sec/call (5,000 elements, 5,000 keys) setdefault: 2089.86 micro sec/call (5,000 elements, 3 keys) setdefault: 3803.03 micro sec/call (5,000 elements, 5,000 keys) OrderedDict: 4399.16 micro sec/call (5,000 elements, 3 keys) OrderedDict: 16279.14 micro sec/call (5,000 elements, 5,000 keys) defaultdict: 5609.39 micro sec/call (25,000 elements, 3 keys) defaultdict: 25351.60 micro sec/call (25,000 elements, 25,000 keys) setdefault: 10267.00 micro sec/call (25,000 elements, 3 keys) setdefault: 24091.51 micro sec/call (25,000 elements, 25,000 keys) OrderedDict: 22091.98 micro sec/call (25,000 elements, 3 keys) OrderedDict: 94028.00 micro sec/call (25,000 elements, 25,000 keys)
Python 3.3 result:
3.3.2 (default, May 21 2013, 11:50:47) [GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] defaultdict: 8.58 micro sec/call (25 elements, 3 keys) defaultdict: 21.18 micro sec/call (25 elements, 25 keys) setdefault: 10.42 micro sec/call (25 elements, 3 keys) setdefault: 14.58 micro sec/call (25 elements, 25 keys) OrderedDict: 45.43 micro sec/call (25 elements, 3 keys) OrderedDict: 92.69 micro sec/call (25 elements, 25 keys) defaultdict: 20.47 micro sec/call (100 elements, 3 keys) defaultdict: 77.48 micro sec/call (100 elements, 100 keys) setdefault: 34.22 micro sec/call (100 elements, 3 keys) setdefault: 54.86 micro sec/call (100 elements, 100 keys) OrderedDict: 107.37 micro sec/call (100 elements, 3 keys) OrderedDict: 318.98 micro sec/call (100 elements, 100 keys) defaultdict: 714.70 micro sec/call (5,000 elements, 3 keys) defaultdict: 3892.92 micro sec/call (5,000 elements, 5,000 keys) setdefault: 1502.91 micro sec/call (5,000 elements, 3 keys) setdefault: 2888.08 micro sec/call (5,000 elements, 5,000 keys) OrderedDict: 3912.95 micro sec/call (5,000 elements, 3 keys) OrderedDict: 14863.02 micro sec/call (5,000 elements, 5,000 keys) defaultdict: 3649.02 micro sec/call (25,000 elements, 3 keys) defaultdict: 22313.17 micro sec/call (25,000 elements, 25,000 keys) setdefault: 7447.28 micro sec/call (25,000 elements, 3 keys) setdefault: 18426.88 micro sec/call (25,000 elements, 25,000 keys) OrderedDict: 19202.17 micro sec/call (25,000 elements, 3 keys) OrderedDict: 85946.45 micro sec/call (25,000 elements, 25,000 keys)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With