In Tim Peter's answer to "Are there any reasons not to use an ordered dictionary", he says
OrderedDict is a subclass of dict.
It's not a lot slower, but at least doubles the memory over using a plain dict.
Now, while going through a particular question, I tried some sample checks using ipython
and both of them contradict the earlier reasoning:
dict
and OrderedDict
are of same sizeOrderedDict
takes easily around 7-8 times more time than operating on a dict
(Hence a lot slower)Can someone explain to me where I'm going wrong in my reasoning?
import sys
import random
from collections import OrderedDict
test_dict = {}
test_ordered_dict = OrderedDict()
for key in range(10000):
test_dict[key] = random.random()
test_ordered_dict[key] = random.random()
sys.getsizeof(test_dict)
786712
sys.getsizeof(test_ordered_dict)
786712
%timeit
:import sys
import random
from collections import OrderedDict
def operate_on_dict(r):
test_dict = {}
for key in range(r):
test_dict[key] = random.random()
def operate_on_ordered_dict(r):
test_ordered_dict = OrderedDict()
for key in range(r):
test_ordered_dict[key] = random.random()
%timeit for x in range(100): operate_on_ordered_dict(100)
100 loops, best of 3: 9.24 ms per loop
%timeit for x in range(100): operate_on_dict(100)
1000 loops, best of 3: 1.23 ms per loop
I think the problem with size is due to the fact that there's no __sizeof__
method defined in Python 2.X implementation of OrderedDict
, so it simply falls back to dict's __sizeof__
method.
To prove this here I've created a class A
here which extends list
and also added an additional method foo
to check if that affects the size.
class A(list):
def __getitem__(self, k):
return list.__getitem__(self, k)
def foo(self):
print 'abcde'
>>> a = A(range(1000))
>>> b = list(range(1000))
But still same size is returned by sys.getsizeof
:
>>> sys.getsizeof(a), sys.getsizeof(b)
(9120, 9120)
Of course A
is going to be slow because its methods are running in Python while list's method will run in pure C.
>>> %%timeit
... for _ in xrange(1000):
... a[_]
...
1000 loops, best of 3: 449 µs per loop
>>> %%timeit
for _ in xrange(1000):
b[_]
...
10000 loops, best of 3: 52 µs per loop
And this seems to be fixed in Python 3 where there's a well defined __sizeof__
method now:
def __sizeof__(self):
sizeof = _sys.getsizeof
n = len(self) + 1 # number of links including root
size = sizeof(self.__dict__) # instance dictionary
size += sizeof(self.__map) * 2 # internal dict and inherited dict
size += sizeof(self.__hardroot) * n # link objects
size += sizeof(self.__root) * n # proxy objects
return size
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With