Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Expressive way compose generators in Python

I really like Python generators. In particular, I find that they are just the right tool for connecting to Rest endpoints - my client code only has to iterate on the generator that is connected the the endpoint. However, I am finding one area where Python's generators are not as expressive as I would like. Typically, I need to filter the data I get out of the endpoint. In my current code, I pass a predicate function to the generator and it applies the predicate to the data it is handling and only yields data if the predicate is True.

I would like to move toward composition of generators - like data_filter(datasource( )). Here is some demonstration code that shows what I have tried. It is pretty clear why it does not work, what I am trying to figure out is what is the most expressive way of arriving at the solution:

# Mock of Rest Endpoint: In actual code, generator is 
# connected to a Rest endpoint which returns dictionary(from JSON).
def mock_datasource ():
    mock_data = ["sanctuary", "movement", "liberty", "seminar",
                 "formula","short-circuit", "generate", "comedy"]
    for d in mock_data:
        yield d

# Mock of a filter: simplification, in reality I am filtering on some
# aspect of the data, like data['type'] == "external" 
def data_filter (d):
    if len(d) < 8:
        yield d

# First Try:
# for w in data_filter(mock_datasource()):
#     print(w)
# >> TypeError: object of type 'generator' has no len()

# Second Try 
# for w in (data_filter(d) for d in mock_datasource()):
#     print(w)
# I don't get words out, 
# rather <generator object data_filter at 0x101106a40>

# Using a predicate to filter works, but is not the expressive 
# composition I am after
for w in (d for d in mock_datasource() if len(d) < 8):
    print(w)
like image 589
chladni Avatar asked Jan 12 '18 19:01

chladni


2 Answers

data_filter should apply len on the elements of d not on d itself, like this:

def data_filter (d):
    for x in d:
        if len(x) < 8:
            yield x

now your code:

for w in data_filter(mock_datasource()):
    print(w)

returns

liberty
seminar
formula
comedy
like image 120
Jean-François Fabre Avatar answered Oct 10 '22 20:10

Jean-François Fabre


More concisely, you can do this with a generator expression directly:

def length_filter(d, minlen=0, maxlen=8):
    return (x for x in d if minlen <= len(x) < maxlen)

Apply the filter to your generator just like a regular function:

for element in length_filter(endpoint_data()):
    ...

If your predicate is really simple, the built-in function filter may also meet your needs.

like image 28
wim Avatar answered Oct 10 '22 19:10

wim