I have the following code in Python: <pre class="prettyprint"><code>import time import sys def returnlist(times): t = time.time() l = [i for i in range(times)] print "list: {}".format(time.time() - t) return l def returngenerator(times): t = time.time() g = (i for i in range(times)) print "generator: {}".format(time.time() - t) return g g = returngenerator(times) l = returnlist(times) </code></pre> 1.For times = 1000000 I get the results: <blockquote> generator: 0.107323884964 list: 0.225493192673 </blockquote> 2.For times = 10000000 I get: <blockquote> generator: 0.856524944305 list: 1.83883309364 </blockquote> I understand why the 2nd list would take more time to create but why would the 2nd generator take more time as well? I assumed that due to lazy evaluation it would take about the same time to create as the 1st generator. I am running this program on an Ubuntu VM

The problem in your code is the <code>range</code> function. In Python 2, it creates a list. For large lists like the ones in your benchmarks, this becomes a problem. In Python 3, <code>range</code> returns a generator. A workaround for Python 2 is to use the <code>xrange</code> function, which is lazy as well. As a test, let's create a benchmark function like yours, but using xrange: <pre class="prettyprint"><code>def returngenerator2(times): t = time.time() g = (i for i in xrange(times)) print "generator2: {}".format(time.time() - t) return g </code></pre> And test it: <pre class="prettyprint"><code>>>> l = returnlist(10**7) list: 0.580000162125 >>> g = returngenerator(10**7) generator: 0.115000009537 >>> x = returngenerator2(10**7) generator2: 0.0 >>> x2 = returngenerator2(10**8) generator2: 0.0 >>> x3 = returngenerator2(10**9) generator2: 0.0 </code></pre> Seems to work. :)

Python Generator that yields more results takes more time to create

Tags:

python

generator

I have the following code in Python:

import time
import sys

def returnlist(times):
    t = time.time()
    l = [i for i in range(times)]
    print "list: {}".format(time.time() - t)
return l

def returngenerator(times):
    t = time.time()
    g = (i for i in range(times))
    print "generator: {}".format(time.time() - t)
    return g

g = returngenerator(times)
l = returnlist(times)

1.For times = 1000000 I get the results:

generator: 0.107323884964

list: 0.225493192673

2.For times = 10000000 I get:

generator: 0.856524944305

list: 1.83883309364

I understand why the 2nd list would take more time to create but why would the 2nd generator take more time as well? I assumed that due to lazy evaluation it would take about the same time to create as the 1st generator.

I am running this program on an Ubuntu VM

954

asked Apr 08 '14 11:04

GeorgeG

1 Answers

The problem in your code is the range function. In Python 2, it creates a list. For large lists like the ones in your benchmarks, this becomes a problem. In Python 3, range returns a generator. A workaround for Python 2 is to use the xrange function, which is lazy as well.

As a test, let's create a benchmark function like yours, but using xrange:

def returngenerator2(times):
    t = time.time()
    g = (i for i in xrange(times))
    print "generator2: {}".format(time.time() - t)
    return g

And test it:

>>> l = returnlist(10**7)
list: 0.580000162125
>>> g = returngenerator(10**7)
generator: 0.115000009537
>>> x = returngenerator2(10**7)
generator2: 0.0
>>> x2 = returngenerator2(10**8)
generator2: 0.0
>>> x3 = returngenerator2(10**9)
generator2: 0.0

Seems to work. :)

188

answered Oct 23 '22 18:10

Carsten

Related questions
                            
                                What is the difference between Pycrypto's Random.get_random_bytes and a simple random byte generator?
                            
                                How to create a Numpy array from a large list of list- python
                            
                                Sometimes, map is a sequence, sometimes not?
                            
                                Is it better to use self variable than pass variable in a class? [closed]
                            
                                Can you display an image inside of a python program (without using pygame)?
                            
                                Reading a JPEG in Python (PIL) with broken header
                            
                                Installing lower version of GLIBC and running pyinstaller
                            
                                Opening already opened file does not raise exception
                            
                                Why does installing matplotlib for Python v3.4 fail on Windows 7 (x64)?
                            
                                UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' - -when using urlib.request python3
                            
                                celery daemon - permission denied on log file
                            
                                Summation Evaluation in python
                            
                                List of rc keys in matplotlib. Tick label rotations
                            
                                ImportError: Entry point ('console_scripts', 'easy_install') not found
                            
                                Bypass error and continue code
                            
                                Celery will refuse to accept pickle by default, should I disable it?
                            
                                Applying a function to a MultiIndex pandas.DataFrame column
                            
                                How to enable Pan and Zoom in a QGraphicsView
                            
                                Pandas join/merge/concat two dataframes
                            
                                Override default get_absolute_url on User objects?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With