Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

why sum on lists is (sometimes) faster than itertools.chain?

Tags:

python

list

I answered several questions here by using this to "flatten" a list of lists:

>>> l = [[1,2,3],[4,5,6],[7,8,9]]
>>> sum(l,[])

it works fine and yields:

[1, 2, 3, 4, 5, 6, 7, 8, 9]

although I was told that the sum operator does a = a + b which is not as performant as itertools.chain

My planned question was "why is it possible on lists where it is prevented on strings", but I made a quick benchmark on my machine comparing sum and itertools.chain.from_iterable on the same data:

import itertools,timeit

print(timeit.timeit("sum(l,[])",setup='l = [[1,2,3],[4,5,6],[7,8,9]]'))
print(timeit.timeit("list(itertools.chain.from_iterable(l))",setup='l = [[1,2,3],[4,5,6],[7,8,9]]'))

I did that several times and I always get about the same figures as below:

0.7155522836070246
0.9883352857722025

To my surprise, chain - recommended over sum for lists by everyone in several comments on my answers - is much slower.

It's still interesting when iterating in a for loop because it doesn't actually create the list, but when creating the list, sum wins.

So should we drop itertools.chain and use sum when the expected result is a list ?

EDIT: thanks to some comments, I made another test by increasing the number of lists

s = 'l = [[4,5,6] for _ in range(20)]'
print(timeit.timeit("sum(l,[])",setup=s))
print(timeit.timeit("list(itertools.chain.from_iterable(l))",setup=s))

now I get the opposite:

6.479897810702537
3.793455760814343
like image 431
Jean-François Fabre Avatar asked Jan 20 '17 20:01

Jean-François Fabre


People also ask

Is Itertools faster than list comprehension?

Comprehensions are better than for-loops to iterate over sequences, but the itertools module provides us with other faster, memory-efficient ways for iterating.

Is Itertools faster than for loops?

While this is a perfectly fine approach, it is important to remember that utilizing the itertools iterators means using iterators that are Pythonic implementations of iterators elsewhere. That being said, the iterators from itertools are often significantly faster than regular iteration from a standard Python for loop.

Why itertool is used in Python?

Itertools is a module in Python, it is used to iterate over data structures that can be stepped over using a for-loop. Such data structures are also known as iterables. This module works as a fast, memory-efficient tool that is used either by themselves or in combination to form iterator algebra.

How does Itertools chain work?

itertools. chain is a generator function which accepts iterables as arguments. The function starts by iteratively returning each element from the first argument until it is exhausted. It then moves on to the next iterable and repeats the process.


1 Answers

Your test inputs are tiny. At those scales, the horrific O(n^2) asymptotic runtime of the sum version isn't visible. The timings are dominated by constant factors, and sum has a better constant factor, since it doesn't have to work through iterators.

With bigger lists, it becomes clear that sum is not at all designed for this kind of thing:

>>> timeit.timeit('list(itertools.chain.from_iterable(l))',
...               'l = [[i] for i in xrange(5000)]; import itertools',
...               number=1000)
0.20425895931668947
>>> timeit.timeit('sum(l, [])', 'l = [[i] for i in xrange(5000)]', number=1000)
49.55303902059097
like image 122
user2357112 supports Monica Avatar answered Sep 30 '22 14:09

user2357112 supports Monica