Why does a generator expression need a lot of memory?

Problem

Let's assume that I want to find n**2 for all numbers smaller than 20000000.

General setup for all three variants that I test:

import time, psutil, gc  gc.collect() mem_before = psutil.virtual_memory()[3] time1 = time.time()  # (comprehension, generator, function)-code comes here  time2 = time.time() mem_after =  psutil.virtual_memory()[3]  print "Used Mem = ", (mem_after - mem_before)/(1024**2)  # convert Byte to Megabyte print "Calculation time = ", time2 - time1

Three options to calculate these numbers:

1. Creating a list of via comprehension:

x = [i**2 for i in range(20000000)]

It is really slow and time consuming:

Used Mem =  1270  # Megabytes Calculation time =  33.9309999943  # Seconds

2. Creating a generator using '()':

x = (i**2 for i in range(20000000))

It is much faster than option 1, but still uses a lot of memory:

Used Mem =  611  Calculation time =  0.278000116348

3. Defining a generator function (most efficient):

def f(n):     i = 0     while i < n:         yield i**2         i += 1 x = f(20000000)

Its consumption:

Used Mem =  0 Calculation time =  0.0

The questions are:

What's the difference between the first and second solutions? Using () creates a generator, so why does it need a lot of memory?
Is there any built-in function equivalent to my third option?

307

asked May 11 '16 08:05

EbraHim

1 Answers

As others have pointed out in the comments, range creates a list in Python 2. Hence, it is not the generator per se that uses up the memory, but the range that the generator uses:
```
x = (i**2 for i in range(20000000))   # builds a 2*10**7 element list, not for the squares , but for the bases  >>> sys.getsizeof(range(100)) 872 >>> sys.getsizeof(xrange(100)) 40 >>> sys.getsizeof(range(1000)) 8720 >>> sys.getsizeof(xrange(1000)) 40 >>> sys.getsizeof(range(20000000)) 160000072 >>> sys.getsizeof(xrange(20000000)) 40 
```
This also explains why your second version (the generator expression) uses around half the memory of the first version (the list comprehension) as the first one builds two lists (for the bases and the squares) while the second only builds one list for the bases.
xrange(20000000) thus, greatly improves memory usage as it returns a lazy iterable. This is essentially the built-in memory efficient way to iterate over a range of numbers that mirrors your third version (with the added flexibility of start, stop and step):
```
x = (i**2 for i in xrange(20000000)) 
```
In Python 3, range is essentially what xrange used to be in Python 2. However, the Python 3 range object has some nice features that Python 2's xrange doesn't have, like O(1) slicing, contains, etc.

Some references:

Python2 xrange docs
Python3 range docs
Stack Overflow - "Should you always favor xrange() over range()?"
Martijn Pieters excellent answer to "Why is 1000000000000000 in range(1000000000000001) so fast in Python 3?"

104

answered Sep 18 '22 14:09

user2390182

Related questions
                            
                                Group by multiple keys and summarize/average values of a list of dictionaries
                            
                                How can I resolve TypeError with StringIO in Python 2.7?
                            
                                numpy - evaluate function on a grid of points
                            
                                Python: How to create log file everyday using logging module?
                            
                                What is vectorization? [closed]
                            
                                Shapely: Polygon from String?
                            
                                How can I use Sphinx' Autodoc-extension for private methods?
                            
                                What is a basic example of single inheritance using the super() keyword in Python?
                            
                                Create url without request execution
                            
                                Python two-dimensional array - changing an element [closed]
                            
                                Return value of x = os.system(..) [duplicate]
                            
                                python: using .iterrows() to create columns
                            
                                Filtering pandas dataframe rows by contains str
                            
                                Detect angle and rotate an image in Python [closed]
                            
                                Splitting a string separated by "\r\n" into a list of lines?
                            
                                Mixing two audio files together with python
                            
                                Fitting data to distributions?
                            
                                scrapy: convert html string to HtmlResponse object
                            
                                seaborn cycle through colours with matplotlib scatter
                            
                                Python subprocess.Popen() error (No such file or directory)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does a generator expression need a lot of memory?

Tags:

python

generator

python-2.7