I am trying to get my head around generators and learn about how to use them. I have seen a lot of examples and get that they produce results one at a time rather than outputting them at once like in a regular function. But all the examples I have seen have involved going through a list and printing values that are generated through the function. What if you want to actually create a list?
For example I have seen an example about even numbers which just generates even numbers and prints them out, but what if I want a list of even numbers like this:
def even(k):
for i in range(k):
if (i%2):
yield k
even_list = []
for i in even(100):
even_list.append(i)
Does this defeat the purpose of using a generator as it then creates this in an even list. Does this method still save some memory/time?
Or is the below method without using generators just as efficient.
def even(k):
evens_list = []
for i in range(k):
if (i%2):
evens_list.append(i)
return evens_list
In this case in what exact cases are generators useful?
Python Generator functions allow you to declare a function that behaves likes an iterator, allowing programmers to make an iterator in a fast, easy, and clean way. An iterator is an object that can be iterated or looped upon. It is used to abstract a container of data to make it behave like an iterable object.
Generators can also be looped over like a regular list. The benefit of using a loop is that it will run through all the yield from start to end. With this, we do not have to worry about the function reaching a StopIteration exception when there is no more data to go through.
Yield is used in Python generators. A generator function is defined just like a normal function, but whenever it needs to generate a value, it does so with the yield keyword rather than return. If the body of a def contains yield, the function automatically becomes a generator function.
Does this defeat the purpose of using a generator as it then creates this in an even list. In this case in what exact cases are generators useful?
This is a bit opinion based, but there are some situations where a list might not do the trick (for example because of hardware limitations).
Imagine that you have a list of even numbers, and then want to take the sum of the first five numbers. In Python we could do that with an islice
, like:
sumfirst5even = sum(islice(even(100), 5))
If we would first generate a list of 100 even numbers (not knowing what we will later do with that list), then we have spent a lot of CPU cycles in the construction of such list, that are wasted.
By using a generator, we can restrict this to only the elements we really need. So we will only yield
the first five elements. The algorithm will never calculate elements larger than 10. Yes, here it is doubful that this will have any (significant) impact. It is even possible that the "generator protocol" will require more CPU cycles compared to generating a list, so for small lists, there is no advantage. But now imagine that we used even(100000)
, then the amount of "useless CPU cycles" we spent on generating an entire list, can be significantly.
Another potential benefit is saving memory, given we do not need all elements of the generator in memory concurrently.
Take for example the following example:
for x in even(1000):
print(x)
If even(..)
constructs a list of 1000
elements, then that means that all these numbers need to be objects in memory concurrently. Depending on the Python interpreter, objects can take significant amount(s) of memory. For example an int
takes in CPython, 28 bytes of memory. So that means that a list containing 500 such int
s can take roughly 14 kB of memory (some extra memory for the list). Yes most Python interpreters maintain a "flyweight" pattern to reduce the burden of small ints (these are shared, and so we do not create a separate object for each int
we construct in the process), but still it can easily add up. For an even(1000000)
, we will need 14 MB of memory.
If we use a generator, than depending on how we use the generator, we might save memory. Why? Because once we no longer need the number 123456
(since the for
loop advances to the next item), the space that object "occupied" can be recycled, and given to an int
object with value 12348
. So it means that - given the way we use the generator permits this - that the memory usage remains constant, whereas for a list it scales linear. Of course the generator itself also needs to do proper management: if in the generator code, we build up a collection, than the memory will of course increase as well.
In 32-bit systems, this can even result in some problems since Python lists have a maximum length. A list can contain at most 536'870'912 elements. Yes that is a huge number, but what if you for example want to generate all permutations of a given list? If we store the permutations in a list, then that means that for a 32-bit system, a list of 13 (or more elements), we will never be able to construct such a list.
In theoretical computer science, an "online algorithm" is by some researchers defined as an algorithm that receives input gradually, and thus does not knows the entire input in advance.
A practical example can be a webcam, that each second makes an image, and sends it to a Python webserver. We do not know at that moment how a picture that will be captured by the webcam within 24 hours will look like. But we might be interested in detecting a burglar that aims to steal something. In that case a list of frames will thus not contain all images. A generator can however construct an elegant "protocol" where we iteratively fetch an image, detect a burglar, and raise an alarm, like:
for frame in from_webcam():
if contains_burglar(frame):
send_alarm_email('Maurice Moss')
We do not need webcams or other hardware to exploit the elegance of generators. Generators can yield an "infinite" sequence. Or even
generator could for example look like:
def even():
i = 0
while True:
yield i
i += 2
This is a generator that will eventually generate all even numbers. If we keep iterating over it, eventually we will yield the number 123'456'789'012'345'678 (although it might take a very long time).
The above can be useful if we want to implement a program that for example keeps yielding even numbers that are palindromes. This could look like:
for i in even():
if is_palindrome(i):
print(i)
We thus can assume that this program will keep working, and do not need to "update" the list of even numbers. In some pure functional languages that make lazy programming transparent, programs are written as if you create a list, but in fact it is typically a generator in place.
range(..)
and friendsIn Python a lot of classes do not construct lists when you iterate over them, for example a range(1000)
object does not first construct a list (it does in python-2.x, but not in python-3.x). The range(..)
object simply represents a range. A range(..)
object is not a generator, but it is a class that can generate an iterator object, that works like a generator.
Besides iterating, we can do all kinds of things with a range(..)
object, that is possible with lists, but not in an efficient way.
If we for example want to know whether 1000000000
is an element of range(400, 10000000000, 2)
, then we can write 1000000000 in range(400, 10000000000, 2)
. Now there is an algorithm in place that will check this without generating the range, or constructing a list: it sees if the elements is an int
, is in the range of the range(..)
object (so greater than or equal to 400
, and less than 10000000000
), and whether it is yielded (taking the step into account), this does not require iterating over it. As a result the membership check can be done instantly.
If we had generated a list, this would mean that Python had to enumerate over every element until it finally can find that element (or reaches the end of the list). For numbers like 1000000000
, this can easily take minutes, hours, maybe days.
We can also "slice" the range object, which yield another range(..)
object, for example:
>>> range(123, 456, 7)[1::4]
range(130, 459, 28)
with an algorithm we can thus instantly slice the range(..)
object into a new range
object. Slicing a list takes linear time. This can again (for huge lists) take significant time and memory.
Generators are shorter and more readable:
In your example, you have to create an empty list, use append
and return the resulting list:
def even(k):
evens_list = []
for i in range(k):
if i % 2 != 0:
evens_list.append(i)
return evens_list
The generator just needs yield
:
def even(k):
for i in range(k):
if i % 2 != 0:
yield i
And the use is nearly the same, if you need really a list. Instead of
event_list = even(100)
the line becomes
event_list = list(even(100))
The generator but in general a lazy semantic offer some advantages:
But also some drawbacks:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With