Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Splitting a Generator Yield into Two Parts

I have access to a generator that yields two values:

def get_document_values():
    docs = query_database()  # returns a cursor to database documents
    for doc in docs:
        # doc is a dictionary with ,say, {'x': 1, 'y': 99}
        yield doc['x'], doc['y']

I have another function, process_x, that I cannot change that can take a generator in as input that processes all of the x for all documents (if a tuple is yielded then it just processes the first element of the tuple and ignores the other elements):

X = process_x(get_document_values())  # This processes x but ignores y

However, I need to store all of the y values from the generator as well. My only solution is to execute get_document_values twice:

Y = [y for x,y in get_document_values()]  #Throw away x
X = process_x(get_document_values())      #Throw away y

This technically works but when there are many documents to process, it is possible that a new document will get inserted into the database and the lengths of X and Y will be different. There needs to be a one-to-one mapping between X and Y and I'd like to only have to call get_document_values once instead of twice.

I've considered something like:

Y = []

def process_y(doc_generator):
    global Y
    for x,y in doc_generator:
        Y.append(y)
        yield x

X = process_x(process_y(get_document_values()))

But:

  1. This doesn't feel pythonic
  2. Y needs to be declared as a global variable

I am hoping that there is a cleaner, more pythonic way to do this.

Update

In reality, get_document_values will return values of x that are too large to be collectively stored into memory and process_x is actually reducing that memory requirement. So, it is not possible to cache all of x. Caching all of y is fine though.

like image 466
slaw Avatar asked Nov 07 '18 13:11

slaw


People also ask

Can I yield two values Python?

To sum up, you can leverage the yield statements in Python to return multiple values from generator functions.

Can you unpack a generator Python?

You can carry out the unpacking procedure for all kinds of iterables like lists, tuples, strings, iterators and generators.

How do you yield a generator in Python?

You can assign this generator to a variable in order to use it. When you call special methods on the generator, such as next() , the code within the function is executed up to yield . When the Python yield statement is hit, the program suspends function execution and returns the yielded value to the caller.

Can you yield multiple times Python?

The yield statement can run multiple times. The return statement is placed inside a regular Python function. The yield statement converts a regular function into a generator function.


2 Answers

You are caching all the values into a list already when calling:

all_values = [(x,y) for x,y in get_document_values()] #or list(get_document_values())

You can get an iterator to y values with:

Y = map(itemgetter(1), all_values)

And for x simple use:

X = process_x(map(itemgetter(0), all_values))

The other option is to separate the generator, for example:

def get_document_values(getter):
    docs = query_database()  # returns a cursor to database documents
    for doc in docs:
        # doc is a dictionary with ,say, {'x': 1, 'y': 99}
        yield getter(doc)

from operator import itemgetter
X = process_x(get_document_values(itemgetter("x")))
Y = list(get_document_values(itemgetter("y")))

This way you will have to do the query twice, if you find a way of do the query once and duplicate the cursor, then you can abstract it also:

def get_document_values(cursor, getter):
    for doc in cursor:
        # doc is a dictionary with ,say, {'x': 1, 'y': 99}
        yield getter(doc)
like image 136
Netwave Avatar answered Oct 19 '22 19:10

Netwave


No need to save the data:

def process_entry(x, y):
    process_x((x,))
    return y

ys = itertools.starmap(process_entry, your_generator)

Just remember that only when you get a y value, its corresponding x value is processed.

If you beed both, return both as a tuple:

def process_entry(x, y):
    return next(process_x((x,))), y
like image 45
Reut Sharabani Avatar answered Oct 19 '22 19:10

Reut Sharabani