I really like Python generators. In particular, I find that they are just the right tool for connecting to Rest endpoints - my client code only has to iterate on the generator that is connected the the endpoint. However, I am finding one area where Python's generators are not as expressive as I would like. Typically, I need to filter the data I get out of the endpoint. In my current code, I pass a predicate function to the generator and it applies the predicate to the data it is handling and only yields data if the predicate is True.
I would like to move toward composition of generators - like data_filter(datasource( )). Here is some demonstration code that shows what I have tried. It is pretty clear why it does not work, what I am trying to figure out is what is the most expressive way of arriving at the solution:
# Mock of Rest Endpoint: In actual code, generator is
# connected to a Rest endpoint which returns dictionary(from JSON).
def mock_datasource ():
mock_data = ["sanctuary", "movement", "liberty", "seminar",
"formula","short-circuit", "generate", "comedy"]
for d in mock_data:
yield d
# Mock of a filter: simplification, in reality I am filtering on some
# aspect of the data, like data['type'] == "external"
def data_filter (d):
if len(d) < 8:
yield d
# First Try:
# for w in data_filter(mock_datasource()):
# print(w)
# >> TypeError: object of type 'generator' has no len()
# Second Try
# for w in (data_filter(d) for d in mock_datasource()):
# print(w)
# I don't get words out,
# rather <generator object data_filter at 0x101106a40>
# Using a predicate to filter works, but is not the expressive
# composition I am after
for w in (d for d in mock_datasource() if len(d) < 8):
print(w)
data_filter
should apply len
on the elements of d
not on d
itself, like this:
def data_filter (d):
for x in d:
if len(x) < 8:
yield x
now your code:
for w in data_filter(mock_datasource()):
print(w)
returns
liberty
seminar
formula
comedy
More concisely, you can do this with a generator expression directly:
def length_filter(d, minlen=0, maxlen=8):
return (x for x in d if minlen <= len(x) < maxlen)
Apply the filter to your generator just like a regular function:
for element in length_filter(endpoint_data()):
...
If your predicate is really simple, the built-in function filter
may also meet your needs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With