Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OrderedDict vs Dict in python

In Tim Peter's answer to "Are there any reasons not to use an ordered dictionary", he says

OrderedDict is a subclass of dict.

It's not a lot slower, but at least doubles the memory over using a plain dict.

Now, while going through a particular question, I tried some sample checks using ipython and both of them contradict the earlier reasoning:

  1. both dict and OrderedDict are of same size
  2. operating on an OrderedDict takes easily around 7-8 times more time than operating on a dict (Hence a lot slower)

Can someone explain to me where I'm going wrong in my reasoning?


Create a large Dict and OrderedDict and compare sizes:

import sys
import random
from collections import OrderedDict

test_dict = {}
test_ordered_dict = OrderedDict()

for key in range(10000):
    test_dict[key] = random.random()
    test_ordered_dict[key] = random.random()

sys.getsizeof(test_dict)
786712

sys.getsizeof(test_ordered_dict)
786712

Check time taken for the insertions using %timeit:

import sys
import random
from collections import OrderedDict

def operate_on_dict(r):
    test_dict = {}
    for key in range(r):
        test_dict[key] = random.random()

def operate_on_ordered_dict(r):
    test_ordered_dict = OrderedDict()
    for key in range(r):
        test_ordered_dict[key] = random.random()

%timeit for x in range(100): operate_on_ordered_dict(100)
100 loops, best of 3: 9.24 ms per loop

%timeit for x in range(100): operate_on_dict(100)
1000 loops, best of 3: 1.23 ms per loop
like image 288
Anshul Goyal Avatar asked Jul 31 '14 10:07

Anshul Goyal


1 Answers

I think the problem with size is due to the fact that there's no __sizeof__ method defined in Python 2.X implementation of OrderedDict, so it simply falls back to dict's __sizeof__ method.

To prove this here I've created a class A here which extends list and also added an additional method foo to check if that affects the size.

class A(list):
    def __getitem__(self, k):
        return list.__getitem__(self, k)
    def foo(self):
        print 'abcde'

>>> a = A(range(1000))
>>> b = list(range(1000))

But still same size is returned by sys.getsizeof:

>>> sys.getsizeof(a), sys.getsizeof(b)
(9120, 9120)

Of course A is going to be slow because its methods are running in Python while list's method will run in pure C.

>>> %%timeit
... for _ in xrange(1000):
...     a[_]
... 
1000 loops, best of 3: 449 µs per loop
>>> %%timeit
for _ in xrange(1000):
    b[_]
... 
10000 loops, best of 3: 52 µs per loop

And this seems to be fixed in Python 3 where there's a well defined __sizeof__ method now:

def __sizeof__(self):
    sizeof = _sys.getsizeof
    n = len(self) + 1                       # number of links including root
    size = sizeof(self.__dict__)            # instance dictionary
    size += sizeof(self.__map) * 2          # internal dict and inherited dict
    size += sizeof(self.__hardroot) * n     # link objects
    size += sizeof(self.__root) * n         # proxy objects
    return size
like image 140
Ashwini Chaudhary Avatar answered Oct 22 '22 19:10

Ashwini Chaudhary