I expected that in the case of multiple loops list iteration will be much faster than using a generator, and my code suggests this is false.
My understanding is (by operation I mean any expression defining an element):
And I checked my expectations using the following code:
from timeit import timeit
def pow2_list(n):
"""Return a list with powers of 2"""
results = []
for i in range(n):
results.append(2**i)
return results
def pow2_gen(n):
"""Generator of powers of 2"""
for i in range(n):
yield 2**i
def loop(iterator, n=1000):
"""Loop n times over iterable object"""
for _ in range(n):
for _ in iterator:
pass
l = pow2_list(1000) # point to a list
g = pow2_gen(1000) # point to a generator
time_list = \
timeit("loop(l)", setup="from __main__ import loop, l", number=10)
time_gen = \
timeit("loop(g)", setup="from __main__ import loop, g", number=10)
print("Loops over list took: ", time_list)
print("Loops over generator took: ", time_gen)
And the results surprised me...
Loops over list took: 0.20484769299946493
Loops over generator took: 0.0019217690005461918
Somehow using generators appears much faster than lists, even when looping over 1000 times. And in this case we are talking about two orders of magnitude! Why?
EDIT:
Thanks for answers. Now I see my mistake. I wrongly assumed that generator starts from beginning on a new loop, like range:
>>> x = range(10)
>>> sum(x)
45
>>> sum(x)
45
But this was naive (range is not a generator...).
Regarding possible duplicate comment: my problem concerned multiple loops over generator, which is not explained in the other thread.
Your generator is actually only looping once. Once created with pow2_gen
, g
stores a generator; the very first time through loop
, this generator is consumed, and emits StopIteration
. The other times through loop
, next(g)
(or g.next()
in Python 2) just continues to throw StopIteration
, so, in effect g
represents an empty sequence.
To make the comparison more fair, you would need to re-create the generator every time you loop.
A further difficulty with the way you've approached this is that you're calling append
to build your list, which is likely the very slowest way to construct a list. More often, lists are built with list comprehensions.
The following code lets us pick apart the timing a bit more carefully. create_list
and create_gen
create lists and generators, respectively, using list comprehension and generator expressions. time_loop
is like your loop
method, while time_apply
is a version of loop
that re-creates the iterable each time through the loop.
def create_list(n=1000):
return [2**i for i in range(n)]
def create_gen(n=1000):
return (2**i for i in range(n))
def time_loop(iterator, n=1000):
for t in range(n):
for v in iterator:
pass
def time_apply(create_fn, fn_arg, n=1000):
for t in range(n):
iterator = create_fn(fn_arg)
time_loop(iterator, 1)
print('time_loop(create_list): %.3f' % timeit("time_loop(create_list(1000))",
setup="from __main__ import *",
number=10))
print('time_loop(create_gen): %.3f' % timeit("time_loop(create_gen(1000))",
setup="from __main__ import *",
number=10))
print('time_apply(create_list): %.3f' % timeit("time_apply(create_list, 1000)",
setup="from __main__ import *",
number=10))
print('time_apply(create_gen): %.3f' % timeit("time_apply(create_gen, 1000)",
setup="from __main__ import *",
number=10))
Results on my box suggest that building a list (time_apply(create_list)
) is similar in time to (or maybe even faster than) building a generator (time_apply(create_gen)
).
time_loop(create_list): 0.244
time_loop(create_gen): 0.028
time_apply(create_list): 21.190
time_apply(create_gen): 21.555
You can see the same effect you've documented in your question, which is that time_loop(create_gen)
is an order of magnitude faster than time_loop(create_list)
. Again, this is because the generator created is only being iterated over once, rather than the many loops over the list.
As you hypothesise, building a list once and iterating over it many times (time_loop(create_list)
) is faster than iterating over a generator many times (time_apply(create_gen)
) in this particular scenario.
The trade-off between list and generator is going to be strongly dependent on how big the iterator you're creating is. With 1000 items, I would expect lists to be pretty fast. With 100,000 items, things might look different.
print('create big list: %.3f' % timeit("l = create_list(100000)",
setup="from __main__ import *",
number=10))
print('create big gen: %.3f' % timeit("g = create_gen(100000)",
setup="from __main__ import *",
number=10))
Here I'm getting:
create big list: 209.748
create big gen: 0.023
Python uses up between 700 and 800 MB of memory building the big list; the generator uses almost nothing at all. Memory allocation and garbage cleanup are computationally expensive in Python, and predictably make your code slow; generators are a very simple way to avoid gobbling up your machine's RAM, and can make a big difference to runtime.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With