Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the pythonic/idiomatic way of filtering the output of a generator expression?

Suppose we have a generator expression, perhaps a simple one, but not necessarily so:

(function(x) for x in values)

What is the preferred way to filter the values generated by this generator expression? I.e. we don't want to filter on the value of x, but on the value of function(x)?

Of course

# this only filters on the inputs to the function, not on its results
(function(x) for x in values if _some_condition_expr_) 

I presume that the following would be most pythonic (incidentally also getting rid of the generator expression itself):

_ = lambda x: x  # simple filter for truthy values

filter(_, map(function, values))           # <<< is this the best we can do?
# or
filter(_, (generator_expression_contents_here))

- as opposed to this abomination:

(y for y in (function(x) for x in values) if y)

Is there something I'm missing in generator expressions that would allow filtering the result without nesting expressions etc.? In other words, is the filter(map()) approach the best we can do? I'm not trying to find something esoteric, just making sure that I'm not missing some cleaner or more Pythonic way of doing it.

AFAIK, Python doesn't come with an identity function (_ above), nor with an is_true function.

like image 866
Kuba hasn't forgotten Monica Avatar asked May 19 '21 00:05

Kuba hasn't forgotten Monica


People also ask

Is filter a Pythonic?

Filter() is a built-in function in Python.

What are generator expressions?

A generator expression is an expression that returns a generator object. Basically, a generator function is a function that contains a yield statement and returns a generator object.

Which is faster filter or list comprehension?

1 Answer. Actually, list comprehension is much clearer and faster than filter+lambda, but you can use whichever you find easier.

Is Python filter a generator?

The natural replacement for filter() is a generator expression. That's because filter() returns an iterator that yields items on demand just like a generator expression does. Python iterators are known to be memory efficient. That's why filter() now returns an iterator instead of a list.


4 Answers

This is one of the use-cases proposed for assignment expressions:

>>> def f(x):
...     return x % 3
... 
>>> g = (fx for x in range(10) if (fx := f(x)))
>>> list(g)
[1, 2, 1, 2, 1, 2]

Adapted from Simplifying list comprehensions in PEP 572.

If you're using a version of Python before assignment expressions are supported, or you just find them ugly, then chaining generators is fine:

>>> g1 = (f(x) for x in range(10))
>>> g2 = (x for x in g1 if x)
>>> list(g2)
[1, 2, 1, 2, 1, 2]

Note that simply filtering on truthy values is supported directly by using None instead of a callable:

>>> list(filter(None, range(3)))
[1, 2]

So you could use filter(None, map(f, vals)) to similar effect.

like image 195
wim Avatar answered Oct 19 '22 11:10

wim


It's a matter of debate as to whether this is better, but := can be used here:

filtered = (res for val in values if (res := function(val)))

The result of the function is assigned to res, that result is used as the predicate, then res is available to be used on the left.

like image 29
Carcigenicate Avatar answered Oct 19 '22 10:10

Carcigenicate


If you don't want to use filter, then the cleanest way is to just write another generator expression. If syntactically nested generators are an abomination, then name them instead of nesting them:

bar = (some_function(x) for x in foo)
baz = (y for y in bar if some_condition(y))

Note that you shouldn't use bar for anything else, since it can only be consumed once, and baz wants to consume it.

like image 3
kaya3 Avatar answered Oct 19 '22 10:10

kaya3


You can iterate over the mapped values instead in your generator expression:

(y for y in map(function, values) if y)
like image 3
blhsing Avatar answered Oct 19 '22 09:10

blhsing