Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: iterating over list vs over dict items efficiency

Is iterating over some_dict.items() as efficient as iterating over a list of the same items in CPython?

like image 447
Jonathan Livni Avatar asked Sep 22 '12 12:09

Jonathan Livni


People also ask

Is iterating through a dictionary faster than a list?

Using iteritems is a tad bit faster... But the time to create a view is negligable; it is actually slower to iterate over than a list. This means that in Python 3, if you want to iterate many times over the items in a dictionary, and performance is critical, you can get a 30% speedup by caching the view as a list.

Are lists or dictionaries more efficient in Python?

Analysis Of The Test Run ResultA dictionary is 6.6 times faster than a list when we lookup in 100 items.

Are dictionaries faster than lists Python?

The list is an ordered collection of data, whereas the dictionaries store the data in the form of key-value pairs using the hashtable structure. Due to this, fetching the elements from the list data structure is quite complex compared to dictionaries in Python. Therefore, the dictionary is faster than a list in Python.

Are Python Dicts slow?

Python is slow. I bet you might encounter this counterargument many times about using Python, especially from people who come from C or C++ or Java world. This is true in many cases, for instance, looping over or sorting Python arrays, lists, or dictionaries can be sometimes slow.


Video Answer


2 Answers

It depends on which version of Python you're using. In Python 2, some_dict.items() creates a new list, which takes up some additional time and uses up additional memory. On the other hand, once the list is created, it's a list, and so should have identical performance characteristics after the overhead of list creation is complete.

In Python 3, some_dict.items() creates a view object instead of a list, and I anticipate that creating and iterating over items() would be faster than in Python 2, since nothing has to be copied. But I also anticipate that iterating over an already-created view would be a bit slower than iterating over an already-created list, because dictionary data is stored somewhat sparsely, and I believe there's no good way for python to avoid iterating over every bin in the dictionary -- even the empty ones.

In Python 2, some timings confirm my intuitions:

>>> some_dict = dict(zip(xrange(1000), reversed(xrange(1000))))
>>> some_list = zip(xrange(1000), xrange(1000))
>>> %timeit for t in some_list: t
10000 loops, best of 3: 25.6 us per loop
>>> %timeit for t in some_dict.items(): t
10000 loops, best of 3: 57.3 us per loop

Iterating over the items is roughly twice as slow. Using iteritems is a tad bit faster...

>>> %timeit for t in some_dict.iteritems(): t
10000 loops, best of 3: 41.3 us per loop

But iterating over the list itself is basically the same as iterating over any other list:

>>> some_dict_list = some_dict.items()
>>> %timeit for t in some_dict_list: t
10000 loops, best of 3: 26.1 us per loop

Python 3 can create and iterate over items faster than Python 2 can (compare to 57.3 us above):

>>> some_dict = dict(zip(range(1000), reversed(range(1000))))
>>> %timeit for t in some_dict.items(): t      
10000 loops, best of 3: 33.4 us per loop 

But the time to create a view is negligable; it is actually slower to iterate over than a list.

>>> some_list = list(zip(range(1000), reversed(range(1000))))
>>> some_dict_view = some_dict.items()
>>> %timeit for t in some_list: t
10000 loops, best of 3: 18.6 us per loop
>>> %timeit for t in some_dict_view: t
10000 loops, best of 3: 33.3 us per loop

This means that in Python 3, if you want to iterate many times over the items in a dictionary, and performance is critical, you can get a 30% speedup by caching the view as a list.

>>> some_list = list(some_dict_view)
>>> %timeit for t in some_list: t
100000 loops, best of 3: 18.6 us per loop
like image 131
senderle Avatar answered Oct 19 '22 00:10

senderle


A little benchmark shows me that iterating a list is definately faster.

def iterlist(list_):
    i = 0
    for _ in list_:
        i += 1
    return i

def iterdict(dict_):
    i = 0
    for _ in dict_.iteritems():
        i += 1
    return i

def noiterdict(dict_):
    i = 0
    for _ in dict_.items():
        i += 1
    return i

list_ = range(1000000)
dict_ = dict(zip(range(1000000), range(1000000)))

Tested with IPython on Python 2.7 (Kubuntu):

%timeit iterlist(list_)
10 loops, best of 3: 28.5 ms per loop

%timeit iterdict(dict_)
10 loops, best of 3: 39.7 ms per loop

%timeit noiterdict(dict_)
10 loops, best of 3: 86.1 ms per loop
like image 8
Wolph Avatar answered Oct 18 '22 23:10

Wolph