Let's assume that I want to find n**2
for all numbers smaller than 20000000
.
import time, psutil, gc gc.collect() mem_before = psutil.virtual_memory()[3] time1 = time.time() # (comprehension, generator, function)-code comes here time2 = time.time() mem_after = psutil.virtual_memory()[3] print "Used Mem = ", (mem_after - mem_before)/(1024**2) # convert Byte to Megabyte print "Calculation time = ", time2 - time1
1. Creating a list of via comprehension:
x = [i**2 for i in range(20000000)]
It is really slow and time consuming:
Used Mem = 1270 # Megabytes Calculation time = 33.9309999943 # Seconds
2. Creating a generator using '()'
:
x = (i**2 for i in range(20000000))
It is much faster than option 1, but still uses a lot of memory:
Used Mem = 611 Calculation time = 0.278000116348
3. Defining a generator function (most efficient):
def f(n): i = 0 while i < n: yield i**2 i += 1 x = f(20000000)
Its consumption:
Used Mem = 0 Calculation time = 0.0
()
creates a generator, so why does it need a lot of memory?Unlike a list, a generator does not store values. Instead, it knows the current value and how to get the next one. This makes a generator memory-efficient. The syntax of looping through a generator is the same as looping through a list.
The generator yields one item at a time and generates item only when in demand. Whereas, in a list comprehension, Python reserves memory for the whole list. Thus we can say that the generator expressions are memory efficient than the lists.
Generators are memory-efficient ways of processing huge datasets. They process the data incrementally and do not allocate memory to all the results at the same time. They really come in handy when implementing data science pipelines for huge datasets in a resource-constrained environment (in terms of RAM).
Implementing our own iterators requires us writing a class as shown earlier, generators don't need classes in python. Generators are faster than iterators but iterators are more memory-effecient.
As others have pointed out in the comments, range
creates a list
in Python 2. Hence, it is not the generator per se that uses up the memory, but the range
that the generator uses:
x = (i**2 for i in range(20000000)) # builds a 2*10**7 element list, not for the squares , but for the bases >>> sys.getsizeof(range(100)) 872 >>> sys.getsizeof(xrange(100)) 40 >>> sys.getsizeof(range(1000)) 8720 >>> sys.getsizeof(xrange(1000)) 40 >>> sys.getsizeof(range(20000000)) 160000072 >>> sys.getsizeof(xrange(20000000)) 40
This also explains why your second version (the generator expression) uses around half the memory of the first version (the list comprehension) as the first one builds two lists (for the bases and the squares) while the second only builds one list for the bases.
xrange(20000000)
thus, greatly improves memory usage as it returns a lazy iterable. This is essentially the built-in memory efficient way to iterate over a range of numbers that mirrors your third version (with the added flexibility of start
, stop
and step
):
x = (i**2 for i in xrange(20000000))
In Python 3, range
is essentially what xrange
used to be in Python 2. However, the Python 3 range
object has some nice features that Python 2's xrange
doesn't have, like O(1)
slicing, contains, etc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With