Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Generator that yields more results takes more time to create

I have the following code in Python:

import time
import sys

def returnlist(times):
    t = time.time()
    l = [i for i in range(times)]
    print "list: {}".format(time.time() - t)
return l

def returngenerator(times):
    t = time.time()
    g = (i for i in range(times))
    print "generator: {}".format(time.time() - t)
    return g

g = returngenerator(times)
l = returnlist(times)

1.For times = 1000000 I get the results:

generator: 0.107323884964

list: 0.225493192673

2.For times = 10000000 I get:

generator: 0.856524944305

list: 1.83883309364

I understand why the 2nd list would take more time to create but why would the 2nd generator take more time as well? I assumed that due to lazy evaluation it would take about the same time to create as the 1st generator.

I am running this program on an Ubuntu VM

like image 954
GeorgeG Avatar asked Apr 08 '14 11:04

GeorgeG


People also ask

Is a generator more efficient Python?

When we are dealing with a large amount of data, using generators is much more efficient. Implementing our own iterators can be difficult. Generators allow us to do this very easily.

Are Python generators slow?

Python Generator Performance One thing to notice here is that, Python generators are slower than Python list comprehension if the memory is large engough to compute.

Why generators are better in Python?

Generators allow you to create iterators in a very pythonic manner. Iterators allow lazy evaluation, only generating the next element of an iterable object when requested. This is useful for very large data sets. Iterators and generators can only be iterated over once.

How many times Yield statement can be used in generator?

You can use multiple yield statements in a generator function. Only one return statement in a normal function can be used.


1 Answers

The problem in your code is the range function. In Python 2, it creates a list. For large lists like the ones in your benchmarks, this becomes a problem. In Python 3, range returns a generator. A workaround for Python 2 is to use the xrange function, which is lazy as well.

As a test, let's create a benchmark function like yours, but using xrange:

def returngenerator2(times):
    t = time.time()
    g = (i for i in xrange(times))
    print "generator2: {}".format(time.time() - t)
    return g

And test it:

>>> l = returnlist(10**7)
list: 0.580000162125
>>> g = returngenerator(10**7)
generator: 0.115000009537
>>> x = returngenerator2(10**7)
generator2: 0.0
>>> x2 = returngenerator2(10**8)
generator2: 0.0
>>> x3 = returngenerator2(10**9)
generator2: 0.0

Seems to work. :)

like image 188
Carsten Avatar answered Oct 23 '22 18:10

Carsten