Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparing two generators in Python

Tags:

python

I am wondering about the use of == when comparing two generators

For example:

x = ['1','2','3','4','5']  gen_1 = (int(ele) for ele in x) gen_2 = (int(ele) for ele in x) 

gen_1 and gen_2 are the same for all practical purposes, and yet when I compare them:

>>> gen_1 == gen_2 False 

My guess here is that == here is treated like is normally is, and since gen_1 and gen_2 are located in different places in memory:

>>> gen_1 <generator object <genexpr> at 0x01E8BAA8> >>> gen_2 <generator object <genexpr> at 0x01EEE4B8> 

their comparison evaluates to False. Am I right on this guess? And any other insight is welcome.

And btw, I do know how to compare two generators:

>>> all(a == b for a,b in zip(gen_1, gen_2)) True 

or even

>>> list(gen_1) == list(gen_2) True 

But if there is a better way, I'd love to know.

like image 863
Akavall Avatar asked Apr 02 '12 20:04

Akavall


People also ask

What is the difference between generator in Python?

A generator in python makes use of the 'yield' keyword. A python iterator doesn't. Python generator saves the states of the local variables every time 'yield' pauses the loop in python. An iterator does not make use of local variables, all it needs is iterable to iterate on.

Are generators faster than iterators?

Generators are not faster than iterators. Generators are iterators. Usually generator functions are actually slower, but more memory efficient.

Why generators are better in Python?

Generators allow you to create iterators in a very pythonic manner. Iterators allow lazy evaluation, only generating the next element of an iterable object when requested. This is useful for very large data sets. Iterators and generators can only be iterated over once.


1 Answers

You are right with your guess – the fallback for comparison of types that don't define == is comparison based on object identity.

A better way to compare the values they generate would be

from itertools import zip_longest, tee sentinel = object() all(a == b for a, b in zip_longest(gen_1, gen_2, fillvalue=sentinel)) 

(For Python 2.x use izip_longest instead of zip_longest)

This can actually short-circuit without necessarily having to look at all values. As pointed out by larsmans in the comments, we can't use zip() here since it might give wrong results if the generators produce a different number of elements – zip() will stop on the shortest iterator. We use a newly created object instance as fill value for zip_longest(), since object instances are also compared by object identity, so sentinel is guaranteed to compare unequal to everything else.

Note that there is no way to compare generators without changing their state. You could store the items that were consumed if you need them later on:

gen_1, gen_1_teed = tee(gen_1) gen_2, gen_2_teed = tee(gen_2) all(a == b for a, b in zip_longest(gen_1, gen_2, fillvalue=sentinel)) 

This will give leave the state of gen_1 and gen_2 essentially unchanged. All values consumed by all() are stored inside the tee object.

At that point, you might ask yourself if it is really worth it to use lazy generators for the application at hand -- it might be better to simply convert them to lists and work with the lists instead.

like image 183
Sven Marnach Avatar answered Sep 28 '22 10:09

Sven Marnach