The below code summarises all the numbers in the list held in all_numbers. This makes sense as all the numbers to be summarised are held in the list.
def firstn(n):
'''Returns list number range from 0 to n '''
num, nums = 0, []
while num < n:
nums.append(num)
num += 1
return nums
# all numbers are held in a list which is memory intensive
all_numbers = firstn(100000000)
sum_of_first_n = sum(all_numbers)
# Uses 3.8Gb during processing and 1.9Gb to store variables
# 13.9 seconds to process
sum_of_first_n
When converting the above function to a generator function, I find I get the same result with less memory used (below code). What I don't understand is how can all_numbers be summarised if it doesn't contain all the numbers in a list like above?
If the numbers are being generated on demand then one would have generate all numbers to summarise them all together, so where are these numbers being stored and how does this translate to reduced memory usage?
def firstn(n):
num = 0
while num < n:
yield num
num += 1
# all numbers are held in a generator
all_numbers = firstn(100000000)
sum_of_first_n = sum(all_numbers)
# Uses < 100Mb during processing and to store variables
# 9.4 seconds to process
sum_of_first_n
I understand how to create a generator function and why you would want to use them but I don't understand how they work.
Key Takeaways. Generators are memory-friendly as they return and store the portion of data only when it is demanded. We can define generators with generators expressions or generator functions.
Simply speaking, a generator is a function that returns an object (iterator) which we can iterate over (one value at a time).
A generator is a special type of function which does not return a single value, instead, it returns an iterator object with a sequence of values. In a generator function, a yield statement is used rather than a return statement. The following is a simple generator function. Example: Generator Function.
You need to call next() or loop through the generator object to access the values produced by the generator expression. When there isn't the next value in the generator object, a StopIteration exception is thrown. A for loop can be used to iterate the generator object.
A generator
do not store the values, you need to think of a generator as a function with context, it will save it state and GENERATE
the values each time it is asked to do so, so, it gives you a value, then "discard" it, hold the context of the computation and wait till you ask for more; and will do so until the generator context is exhausted.
def firstn(n):
num = 0
while num < n:
yield num
num += 1
In this example you provide, the "only" memory used is num
, is where the computation is stored, the firstn
generator holds the num
in its context
till the while loop
is finised.
I think a real example of what your first and second functions/methods are doing under the hood will be helpful and you'll understand better what's going.
Let's print what Python is hidden while processing each function/method using locals()
:
locals(): Update and return a dictionary representing the current local symbol table. Free variables are returned by locals() when it is called in function blocks, but not in class blocks.
>>> def firstn(n):
'''Returns list number range from 0 to n '''
num, nums = 0, []
while num < n:
nums.append(num)
num += 1
print(locals())
return nums
>>> firstn(10)
Will print:
{'nums': [0], 'n': 10, 'num': 1}
{'nums': [0, 1], 'n': 10, 'num': 2}
{'nums': [0, 1, 2], 'n': 10, 'num': 3}
{'nums': [0, 1, 2, 3], 'n': 10, 'num': 4}
{'nums': [0, 1, 2, 3, 4], 'n': 10, 'num': 5}
{'nums': [0, 1, 2, 3, 4, 5], 'n': 10, 'num': 6}
{'nums': [0, 1, 2, 3, 4, 5, 6], 'n': 10, 'num': 7}
{'nums': [0, 1, 2, 3, 4, 5, 6, 7], 'n': 10, 'num': 8}
{'nums': [0, 1, 2, 3, 4, 5, 6, 7, 8], 'n': 10, 'num': 9}
{'nums': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 'n': 10, 'num': 10}
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
But:
>>> def firstn(n):
num = 0
while num < n:
yield num
num += 1
print(locals())
>>> list(firstn(10))
will print:
{'n': 10, 'num': 1}
{'n': 10, 'num': 2}
{'n': 10, 'num': 3}
{'n': 10, 'num': 4}
{'n': 10, 'num': 5}
{'n': 10, 'num': 6}
{'n': 10, 'num': 7}
{'n': 10, 'num': 8}
{'n': 10, 'num': 9}
{'n': 10, 'num': 10}
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
So, as you can see the second function/method (your generator) don't care about the past or the next process's results. This function remembers only the last value (the condition to break the while loop) and generate the results in demand.
However, in your first example, your function/method need to store and remember every step along with the value used to stop the while loop then returning the final result... Which makes the process very long compared to your generator.
this example may help you understand how and when the items are calculated:
def firstn(n):
num = 0
while num < n:
yield num
print('incrementing num')
num += 1
gen = firstn(n=10)
a0 = next(gen)
print(a0) # 0
a1 = next(gen) # incrementing num
print(a1) # 1
a2 = next(gen) # incrementing num
print(a2) # 2
the function does not return
, but it keeps its internal state (stack frame) and continues from the point it yield
ed last time.
the for
loop just calls next
repeatedly.
your next value is calculated on-demand; not all of the possible values need to be in-memory at the time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With