Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OrderedDict vs defaultdict vs dict [closed]

In python's library, we now have two Python Implementation of dictionaries which subclasses dict over and above the native dict type.

Python's advocates have always preferred to defaultdict over using dict.setdefault where possible. Even the doc quotes that This technique is simpler and faster than an equivalent technique using dict.setdefault():

In similar ways, as dictionaries does not maintain order, using OrderedDict over using dict followed by sorting the items is preferred when ever possible for the alternative usage.

In both the above case, the code is definitely cleaner but at the cost of performance penalty.

While answering and commenting on one of the question python unique list based on item, I stumbled upon the performance penalty over the native dict when using defaultdict and OrderedDict. It also seems the size of the data is also not immaterial to the performance advantage dict solution has over others.

I believe There should be one-- and preferably only one --obvious way to do it., so what is the preferred way?

like image 787
Abhijit Avatar asked Oct 28 '13 08:10

Abhijit


People also ask

What is difference between Defaultdict and OrderedDict?

It depends on the data; setdefault is faster and simpler with small data sets; defaultdict is faster for larger data sets with more homogenous key sets (ie, how short the dict is after adding elements);

What is the difference between dict and Defaultdict?

The main difference between defaultdict and dict is that when you try to access or modify a key that's not present in the dictionary, a default value is automatically given to that key . In order to provide this functionality, the Python defaultdict type does two things: It overrides .

Is OrderedDict obsolete?

No it won't become redundant in Python 3.7 because OrderedDict is not just a dict that retains insertion order, it also offers an order dependent method, OrderedDict. move_to_end() , and supports reversed() iteration*.

When should I use OrderedDict?

Intent signaling: If you use OrderedDict over dict , then your code makes it clear that the order of items in the dictionary is important. You're clearly communicating that your code needs or relies on the order of items in the underlying dictionary.


1 Answers

There is not one single answer and not one true and only dict. Among many variables, it depends on:

  1. Size of the data set;
  2. Number of unique keys vs the number of duplicate keys in the set of data mappings;
  3. Speed of the underlying factory for defaultdict;
  4. Speed of OrderDict vs some later ordering step;
  5. Version of Python.

I am loathe to generalize, but here are some generalities:

  1. The statement This technique is simpler and faster than an equivalent technique using dict.setdefault() is just flat wrong. It depends on the data;
  2. setdefault is faster and simpler with small data sets;
  3. defaultdict is faster for larger data sets with more homogenous key sets (ie, how short the dict is after adding elements);
  4. setdefault has an advantage with more heterogeneous key sets;
  5. these results are different for Python 3 vs Python 2;
  6. OrderedDict is slower in all cases other than an algorithm that depends on order and order is not easy to reconstruct or sort;
  7. Python 3 is generally faster for most dict operations;
  8. Python 3.6's dict is now ordered by insertion order (reducing the usefulness of OrderedDict).

The only truth: It Depends! All three technique are useful.

Here is some timing code to show:

from __future__ import print_function from collections import defaultdict from collections import OrderedDict  try:     t=unichr(100) except NameError:     unichr=chr  def f1(li):     '''defaultdict'''     d = defaultdict(list)     for k, v in li:         d[k].append(v)     return d.items()  def f2(li):     '''setdefault'''     d={}     for k, v in li:         d.setdefault(k, []).append(v)     return d.items()  def f3(li):     '''OrderedDict'''     d=OrderedDict()     for k, v in li:         d.setdefault(k, []).append(v)     return d.items()         if __name__ == '__main__':     import timeit     import sys     print(sys.version)     few=[('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]     fmt='{:>12}: {:10.2f} micro sec/call ({:,} elements, {:,} keys)'     for tag, m, n in [('small',5,10000), ('medium',20,1000), ('bigger',1000,100), ('large',5000,10)]:         for f in [f1,f2,f3]:             s = few*m             res=timeit.timeit("{}(s)".format(f.__name__), setup="from __main__ import {}, s".format(f.__name__), number=n)             st=fmt.format(f.__doc__, res/n*1000000, len(s), len(f(s)))             print(st)             s = [(unichr(i%0x10000),i) for i in range(1,len(s)+1)]             res=timeit.timeit("{}(s)".format(f.__name__), setup="from __main__ import {}, s".format(f.__name__), number=n)             st=fmt.format(f.__doc__, res/n*1000000, len(s), len(f(s)))             print(st)                     print()  

Python 2.7 result:

2.7.5 (default, Aug 25 2013, 00:04:04)  [GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)]  defaultdict:      10.20 micro sec/call (25 elements, 3 keys)  defaultdict:      21.08 micro sec/call (25 elements, 25 keys)   setdefault:      13.41 micro sec/call (25 elements, 3 keys)   setdefault:      18.24 micro sec/call (25 elements, 25 keys)  OrderedDict:      49.47 micro sec/call (25 elements, 3 keys)  OrderedDict:     102.16 micro sec/call (25 elements, 25 keys)   defaultdict:      28.28 micro sec/call (100 elements, 3 keys)  defaultdict:      79.78 micro sec/call (100 elements, 100 keys)   setdefault:      45.68 micro sec/call (100 elements, 3 keys)   setdefault:      68.66 micro sec/call (100 elements, 100 keys)  OrderedDict:     117.78 micro sec/call (100 elements, 3 keys)  OrderedDict:     343.17 micro sec/call (100 elements, 100 keys)   defaultdict:    1123.60 micro sec/call (5,000 elements, 3 keys)  defaultdict:    4250.44 micro sec/call (5,000 elements, 5,000 keys)   setdefault:    2089.86 micro sec/call (5,000 elements, 3 keys)   setdefault:    3803.03 micro sec/call (5,000 elements, 5,000 keys)  OrderedDict:    4399.16 micro sec/call (5,000 elements, 3 keys)  OrderedDict:   16279.14 micro sec/call (5,000 elements, 5,000 keys)   defaultdict:    5609.39 micro sec/call (25,000 elements, 3 keys)  defaultdict:   25351.60 micro sec/call (25,000 elements, 25,000 keys)   setdefault:   10267.00 micro sec/call (25,000 elements, 3 keys)   setdefault:   24091.51 micro sec/call (25,000 elements, 25,000 keys)  OrderedDict:   22091.98 micro sec/call (25,000 elements, 3 keys)  OrderedDict:   94028.00 micro sec/call (25,000 elements, 25,000 keys) 

Python 3.3 result:

3.3.2 (default, May 21 2013, 11:50:47)  [GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))]  defaultdict:       8.58 micro sec/call (25 elements, 3 keys)  defaultdict:      21.18 micro sec/call (25 elements, 25 keys)   setdefault:      10.42 micro sec/call (25 elements, 3 keys)   setdefault:      14.58 micro sec/call (25 elements, 25 keys)  OrderedDict:      45.43 micro sec/call (25 elements, 3 keys)  OrderedDict:      92.69 micro sec/call (25 elements, 25 keys)   defaultdict:      20.47 micro sec/call (100 elements, 3 keys)  defaultdict:      77.48 micro sec/call (100 elements, 100 keys)   setdefault:      34.22 micro sec/call (100 elements, 3 keys)   setdefault:      54.86 micro sec/call (100 elements, 100 keys)  OrderedDict:     107.37 micro sec/call (100 elements, 3 keys)  OrderedDict:     318.98 micro sec/call (100 elements, 100 keys)   defaultdict:     714.70 micro sec/call (5,000 elements, 3 keys)  defaultdict:    3892.92 micro sec/call (5,000 elements, 5,000 keys)   setdefault:    1502.91 micro sec/call (5,000 elements, 3 keys)   setdefault:    2888.08 micro sec/call (5,000 elements, 5,000 keys)  OrderedDict:    3912.95 micro sec/call (5,000 elements, 3 keys)  OrderedDict:   14863.02 micro sec/call (5,000 elements, 5,000 keys)   defaultdict:    3649.02 micro sec/call (25,000 elements, 3 keys)  defaultdict:   22313.17 micro sec/call (25,000 elements, 25,000 keys)   setdefault:    7447.28 micro sec/call (25,000 elements, 3 keys)   setdefault:   18426.88 micro sec/call (25,000 elements, 25,000 keys)  OrderedDict:   19202.17 micro sec/call (25,000 elements, 3 keys)  OrderedDict:   85946.45 micro sec/call (25,000 elements, 25,000 keys) 
like image 200
dawg Avatar answered Oct 05 '22 14:10

dawg