I am looking for an efficient, pythonic way to apply multiple filters to a list of tuples.
As an example, assume filters like this:
def f1(t): return t[3]<10 def f2(t): return t[0]!=1 def f3(t): return t[1] in ("lisa","eric") def f4(t): return t[3]>2
And n-tuples (i.e. db-records) like this:
tuples=[ (0,'tom','...',8), (1,'john','...',17), (2,'lisa','...',1), (3,'eric','...',18) ]
The following works:
def nFilter(filters,tuples): if filters and tuples: return nFilter(filters,filter(filters.pop(),tuples)) else: return tuples
With results like:
>>> nFilter([f1,f2,f3],tuples) [(2, 'lisa', '...', 1)]
and
>>> nFilter([f1,f2,f3,f4],tuples) []
But I'm wondering if there is a more direct way; what I had in mind is something like function composition (i.e f1(f2(...fn(tuples)...))
), for an arbitrary list of functions. There are references to a functional library containing a compose
function in the docs, but the links are all dead.
Also, since I'm planning on using this on fairly large data sets, and possibly with a large number of filters in a production web service, it must be efficient, and I can't really say if this solution is.
Any suggestions or improvements are welcome.
To filter a list of tuples in Python: Use the filter() function to filter the list. The filter function returns an iterator containing the results. Pass the filter object to the list() class to convert it to a list.
Python has a built-in function called filter() that allows you to filter a list (or a tuple) in a more beautiful way. The filter() function iterates over the elements of the list and applies the fn() function to each element. It returns an iterator for the elements where the fn() returns True .
Select a cell in the data table. On the Data tab of the Ribbon, in the Sort & Filter group, click Advanced, to open the Advanced Filter dialog box. For Action, select Filter the list, in-place.
filter() method is a very useful method of Python. One or more data values can be filtered from any string or list or dictionary in Python by using filter() method. It filters data based on any particular condition. It stores data when the condition returns true and discard data when returns false.
There isn't really "a composition function for an arbitrary list of functions"; however, it is pretty easy to build the filter chain with a simple for-loop:
def nFilter(filters, tuples): for f in filters: tuples = filter(f, tuples) return tuples
Chained iterators are so fast that the total running time will tend to be dominated by the calls to predicate functions.
The best outcome can be had by ordering the predicates to minimize the total work. In general, it is better to put cheap tests before expensive tests and to put more restrictive tests before tests that don't filter out many cases.
In this example, the predicates have about the same cost (a function call, tuple indexing, and comparison to a constant), but they vary in restrictiveness (the t[2]==4
filters-out 80% of the cases while thet[0]>1
and t[1]<3
each only filter-out 40% of the data).
>>> from itertools import product >>> filters = [lambda t: t[2]==4, lambda t: t[0]>1, lambda t: t[1]<3] >>> for tup in nFilter(filters, product(range(5), repeat=3)): print(tup) (2, 0, 4) (2, 1, 4) (2, 2, 4) (3, 0, 4) (3, 1, 4) (3, 2, 4) (4, 0, 4) (4, 1, 4) (4, 2, 4)
The filter functions make zero applications of the predicate when the input iterable is empty. It is like doing a for-loop over an empty list.
Each filter reduces the amount of data fed into the enclosing filter. Accordingly, each filter gets only gets applied to data that has made it through the previous filters.
Don't worry about the lambda
in the example. It makes the same function as a regular def
. It is just a convenient way of writing the list of filters.
In Python 3, the filter() function was updated to return a iterator instead of a list. In Python 2, you can achieve the same effect using itertools.ifilter() instead filter().
Are you looking for something like this?
filters = (f1,f2,f3,f4) filtered_list = filter( lambda x: all(f(x) for f in filters), your_list )
This has the advantage that as soon as a single filter returns False
, that list element won't be included.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With