I answered several questions here by using this to "flatten" a list of lists: <pre class="prettyprint"><code>>>> l = [[1,2,3],[4,5,6],[7,8,9]] >>> sum(l,[]) </code></pre> it works fine and yields: <pre class="prettyprint"><code>[1, 2, 3, 4, 5, 6, 7, 8, 9] </code></pre> although I was told that the <code>sum</code> operator does <code>a = a + b</code> which is not as performant as <code>itertools.chain</code> My planned question was "why is it possible on lists where it is prevented on strings", but I made a quick benchmark on my machine comparing <code>sum</code> and <code>itertools.chain.from_iterable</code> on the same data: <pre class="prettyprint"><code>import itertools,timeit print(timeit.timeit("sum(l,[])",setup='l = [[1,2,3],[4,5,6],[7,8,9]]')) print(timeit.timeit("list(itertools.chain.from_iterable(l))",setup='l = [[1,2,3],[4,5,6],[7,8,9]]')) </code></pre> I did that several times and I always get about the same figures as below: <pre class="prettyprint"><code>0.7155522836070246 0.9883352857722025 </code></pre> To my surprise, <code>chain</code> - recommended over <code>sum</code> for lists by everyone in several comments on my answers - is much slower. It's still interesting when iterating in a <code>for</code> loop because it doesn't actually create the list, but when creating the list, <code>sum</code> wins. So should we drop <code>itertools.chain</code> and use <code>sum</code> when the expected result is a <code>list</code> ? EDIT: thanks to some comments, I made another test by increasing the number of lists <pre class="prettyprint"><code>s = 'l = [[4,5,6] for _ in range(20)]' print(timeit.timeit("sum(l,[])",setup=s)) print(timeit.timeit("list(itertools.chain.from_iterable(l))",setup=s)) </code></pre> now I get the opposite: <pre class="prettyprint"><code>6.479897810702537 3.793455760814343 </code></pre>

Your test inputs are tiny. At those scales, the horrific O(n^2) asymptotic runtime of the <code>sum</code> version isn't visible. The timings are dominated by constant factors, and <code>sum</code> has a better constant factor, since it doesn't have to work through iterators. With bigger lists, it becomes clear that <code>sum</code> is not at all designed for this kind of thing: <pre class="prettyprint"><code>>>> timeit.timeit('list(itertools.chain.from_iterable(l))', ... 'l = [[i] for i in xrange(5000)]; import itertools', ... number=1000) 0.20425895931668947 >>> timeit.timeit('sum(l, [])', 'l = [[i] for i in xrange(5000)]', number=1000) 49.55303902059097 </code></pre>

why sum on lists is (sometimes) faster than itertools.chain?

Tags:

python

list

I answered several questions here by using this to "flatten" a list of lists:

>>> l = [[1,2,3],[4,5,6],[7,8,9]]
>>> sum(l,[])

it works fine and yields:

[1, 2, 3, 4, 5, 6, 7, 8, 9]

although I was told that the sum operator does a = a + b which is not as performant as itertools.chain

My planned question was "why is it possible on lists where it is prevented on strings", but I made a quick benchmark on my machine comparing sum and itertools.chain.from_iterable on the same data:

import itertools,timeit

print(timeit.timeit("sum(l,[])",setup='l = [[1,2,3],[4,5,6],[7,8,9]]'))
print(timeit.timeit("list(itertools.chain.from_iterable(l))",setup='l = [[1,2,3],[4,5,6],[7,8,9]]'))

I did that several times and I always get about the same figures as below:

0.7155522836070246
0.9883352857722025

To my surprise, chain - recommended over sum for lists by everyone in several comments on my answers - is much slower.

It's still interesting when iterating in a for loop because it doesn't actually create the list, but when creating the list, sum wins.

So should we drop itertools.chain and use sum when the expected result is a list ?

EDIT: thanks to some comments, I made another test by increasing the number of lists

s = 'l = [[4,5,6] for _ in range(20)]'
print(timeit.timeit("sum(l,[])",setup=s))
print(timeit.timeit("list(itertools.chain.from_iterable(l))",setup=s))

now I get the opposite:

6.479897810702537
3.793455760814343

431

asked Jan 20 '17 20:01

Jean-François Fabre

1 Answers

Your test inputs are tiny. At those scales, the horrific O(n^2) asymptotic runtime of the sum version isn't visible. The timings are dominated by constant factors, and sum has a better constant factor, since it doesn't have to work through iterators.

With bigger lists, it becomes clear that sum is not at all designed for this kind of thing:

>>> timeit.timeit('list(itertools.chain.from_iterable(l))',
...               'l = [[i] for i in xrange(5000)]; import itertools',
...               number=1000)
0.20425895931668947
>>> timeit.timeit('sum(l, [])', 'l = [[i] for i in xrange(5000)]', number=1000)
49.55303902059097

122

answered Sep 30 '22 14:09

user2357112 supports Monica

Related questions
                            
                                AttributeError: 'module' object has no attribute 'cbook'
                            
                                pandas plot dataframe as multiple bar charts
                            
                                creating a new line on a textbox in tkinter
                            
                                Group DataFrame in 5-minute intervals
                            
                                How use line.rstrip() in Python?
                            
                                Anaconda Python install imutils in Windows10
                            
                                Transposing (pivoting) a dict of lists in python [duplicate]
                            
                                Can't execute Python Pandas set_value
                            
                                sklearn: calculating accuracy score of k-means on the test data set
                            
                                How to create a unit test to check the response of an API made in Flask? [duplicate]
                            
                                Using IF, AND, OR together with EQUAL operand together in Python [duplicate]
                            
                                Python: String replace index
                            
                                Error in pip install matplotlib in Mac
                            
                                Logical Or/bitwise OR in pandas Data Frame
                            
                                Read in the first column of a CSV in Python
                            
                                how do I calculate a rolling idxmax
                            
                                how to hide axes in matplotlib.pyplot
                            
                                Changing a value in a yaml file using Python
                            
                                How to select duplicate rows with pandas?
                            
                                More Pythonic/Pandaic approach to looping over a pandas Series

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With