Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speed to iterate multiple times over a generator compared to a list

I expected that in the case of multiple loops list iteration will be much faster than using a generator, and my code suggests this is false.

My understanding is (by operation I mean any expression defining an element):

  • a list requires n operations to be initialized
  • but then every loop over the list is just grabbing an element from memory
  • thus, m loops over a list require only n operations
  • a generator does not require any operations to be initialized
  • however, looping over generator runs operations in fly
  • thus, one loop over a generator requires n operations
  • but m loops over a generator require n x m operations

And I checked my expectations using the following code:

from timeit import timeit

def pow2_list(n):
    """Return a list with powers of 2"""

    results = []

    for i in range(n):
        results.append(2**i)

    return results

def pow2_gen(n):
    """Generator of powers of 2"""

    for i in range(n):
        yield 2**i

def loop(iterator, n=1000):
    """Loop n times over iterable object"""

    for _ in range(n):
        for _ in iterator:
            pass

l = pow2_list(1000) # point to a list
g = pow2_gen(1000)  # point to a generator


time_list = \
    timeit("loop(l)", setup="from __main__ import loop, l", number=10)

time_gen = \
    timeit("loop(g)", setup="from __main__ import loop, g", number=10)

print("Loops over list took: ", time_list)
print("Loops over generator took: ", time_gen)

And the results surprised me...

Loops over list took:  0.20484769299946493
Loops over generator took:  0.0019217690005461918

Somehow using generators appears much faster than lists, even when looping over 1000 times. And in this case we are talking about two orders of magnitude! Why?

EDIT:

Thanks for answers. Now I see my mistake. I wrongly assumed that generator starts from beginning on a new loop, like range:

>>> x = range(10)
>>> sum(x)
45
>>> sum(x)
45

But this was naive (range is not a generator...).

Regarding possible duplicate comment: my problem concerned multiple loops over generator, which is not explained in the other thread.

like image 968
tlg Avatar asked Dec 19 '22 11:12

tlg


1 Answers

Your generator is actually only looping once. Once created with pow2_gen, g stores a generator; the very first time through loop, this generator is consumed, and emits StopIteration. The other times through loop, next(g) (or g.next() in Python 2) just continues to throw StopIteration, so, in effect g represents an empty sequence.

To make the comparison more fair, you would need to re-create the generator every time you loop.

A further difficulty with the way you've approached this is that you're calling append to build your list, which is likely the very slowest way to construct a list. More often, lists are built with list comprehensions.

The following code lets us pick apart the timing a bit more carefully. create_list and create_gen create lists and generators, respectively, using list comprehension and generator expressions. time_loop is like your loop method, while time_apply is a version of loop that re-creates the iterable each time through the loop.

def create_list(n=1000):
    return [2**i for i in range(n)]

def create_gen(n=1000):
    return (2**i for i in range(n))

def time_loop(iterator, n=1000):
    for t in range(n):
        for v in iterator:
            pass

def time_apply(create_fn, fn_arg, n=1000):
    for t in range(n):
        iterator = create_fn(fn_arg)
        time_loop(iterator, 1)

print('time_loop(create_list): %.3f' % timeit("time_loop(create_list(1000))",
                                              setup="from __main__ import *",
                                              number=10))

print('time_loop(create_gen): %.3f' % timeit("time_loop(create_gen(1000))",
                                             setup="from __main__ import *",
                                             number=10))

print('time_apply(create_list): %.3f' % timeit("time_apply(create_list, 1000)",
                                               setup="from __main__ import *",
                                               number=10))

print('time_apply(create_gen): %.3f' % timeit("time_apply(create_gen, 1000)",
                                              setup="from __main__ import *",
                                              number=10))

Results on my box suggest that building a list (time_apply(create_list)) is similar in time to (or maybe even faster than) building a generator (time_apply(create_gen)).

time_loop(create_list): 0.244
time_loop(create_gen): 0.028
time_apply(create_list): 21.190
time_apply(create_gen): 21.555

You can see the same effect you've documented in your question, which is that time_loop(create_gen) is an order of magnitude faster than time_loop(create_list). Again, this is because the generator created is only being iterated over once, rather than the many loops over the list.

As you hypothesise, building a list once and iterating over it many times (time_loop(create_list)) is faster than iterating over a generator many times (time_apply(create_gen)) in this particular scenario.

The trade-off between list and generator is going to be strongly dependent on how big the iterator you're creating is. With 1000 items, I would expect lists to be pretty fast. With 100,000 items, things might look different.

print('create big list: %.3f' % timeit("l = create_list(100000)",
                                       setup="from __main__ import *",
                                       number=10))

print('create big gen: %.3f' % timeit("g = create_gen(100000)",
                                      setup="from __main__ import *",
                                      number=10))

Here I'm getting:

create big list: 209.748
create big gen: 0.023

Python uses up between 700 and 800 MB of memory building the big list; the generator uses almost nothing at all. Memory allocation and garbage cleanup are computationally expensive in Python, and predictably make your code slow; generators are a very simple way to avoid gobbling up your machine's RAM, and can make a big difference to runtime.

like image 94
wildwilhelm Avatar answered Dec 28 '22 07:12

wildwilhelm