I answered several questions here by using this to "flatten" a list of lists:
>>> l = [[1,2,3],[4,5,6],[7,8,9]]
>>> sum(l,[])
it works fine and yields:
[1, 2, 3, 4, 5, 6, 7, 8, 9]
although I was told that the sum
operator does a = a + b
which is not as performant as itertools.chain
My planned question was "why is it possible on lists where it is prevented on strings", but I made a quick benchmark on my machine comparing sum
and itertools.chain.from_iterable
on the same data:
import itertools,timeit
print(timeit.timeit("sum(l,[])",setup='l = [[1,2,3],[4,5,6],[7,8,9]]'))
print(timeit.timeit("list(itertools.chain.from_iterable(l))",setup='l = [[1,2,3],[4,5,6],[7,8,9]]'))
I did that several times and I always get about the same figures as below:
0.7155522836070246
0.9883352857722025
To my surprise, chain
- recommended over sum
for lists by everyone in several comments on my answers - is much slower.
It's still interesting when iterating in a for
loop because it doesn't actually create the list, but when creating the list, sum
wins.
So should we drop itertools.chain
and use sum
when the expected result is a list
?
EDIT: thanks to some comments, I made another test by increasing the number of lists
s = 'l = [[4,5,6] for _ in range(20)]'
print(timeit.timeit("sum(l,[])",setup=s))
print(timeit.timeit("list(itertools.chain.from_iterable(l))",setup=s))
now I get the opposite:
6.479897810702537
3.793455760814343
Comprehensions are better than for-loops to iterate over sequences, but the itertools module provides us with other faster, memory-efficient ways for iterating.
While this is a perfectly fine approach, it is important to remember that utilizing the itertools iterators means using iterators that are Pythonic implementations of iterators elsewhere. That being said, the iterators from itertools are often significantly faster than regular iteration from a standard Python for loop.
Itertools is a module in Python, it is used to iterate over data structures that can be stepped over using a for-loop. Such data structures are also known as iterables. This module works as a fast, memory-efficient tool that is used either by themselves or in combination to form iterator algebra.
itertools. chain is a generator function which accepts iterables as arguments. The function starts by iteratively returning each element from the first argument until it is exhausted. It then moves on to the next iterable and repeats the process.
Your test inputs are tiny. At those scales, the horrific O(n^2) asymptotic runtime of the sum
version isn't visible. The timings are dominated by constant factors, and sum
has a better constant factor, since it doesn't have to work through iterators.
With bigger lists, it becomes clear that sum
is not at all designed for this kind of thing:
>>> timeit.timeit('list(itertools.chain.from_iterable(l))',
... 'l = [[i] for i in xrange(5000)]; import itertools',
... number=1000)
0.20425895931668947
>>> timeit.timeit('sum(l, [])', 'l = [[i] for i in xrange(5000)]', number=1000)
49.55303902059097
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With