Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Applying multiple filters to list of tuples

Tags:

I am looking for an efficient, pythonic way to apply multiple filters to a list of tuples.

As an example, assume filters like this:

def f1(t): return t[3]<10 def f2(t): return t[0]!=1 def f3(t): return t[1] in ("lisa","eric") def f4(t): return t[3]>2 

And n-tuples (i.e. db-records) like this:

tuples=[ (0,'tom','...',8), (1,'john','...',17), (2,'lisa','...',1), (3,'eric','...',18) ] 

The following works:

def nFilter(filters,tuples):     if filters and tuples:         return nFilter(filters,filter(filters.pop(),tuples))     else: return tuples 

With results like:

>>> nFilter([f1,f2,f3],tuples) [(2, 'lisa', '...', 1)] 

and

>>> nFilter([f1,f2,f3,f4],tuples) [] 

But I'm wondering if there is a more direct way; what I had in mind is something like function composition (i.e f1(f2(...fn(tuples)...))), for an arbitrary list of functions. There are references to a functional library containing a compose function in the docs, but the links are all dead.

Also, since I'm planning on using this on fairly large data sets, and possibly with a large number of filters in a production web service, it must be efficient, and I can't really say if this solution is.

Any suggestions or improvements are welcome.

like image 237
Alfred Bratterud Avatar asked Sep 12 '12 10:09

Alfred Bratterud


People also ask

How do you filter a list of tuples?

To filter a list of tuples in Python: Use the filter() function to filter the list. The filter function returns an iterator containing the results. Pass the filter object to the list() class to convert it to a list.

How do you filter items in a list in Python?

Python has a built-in function called filter() that allows you to filter a list (or a tuple) in a more beautiful way. The filter() function iterates over the elements of the list and applies the fn() function to each element. It returns an iterator for the elements where the fn() returns True .

How do you filter a list?

Select a cell in the data table. On the Data tab of the Ribbon, in the Sort & Filter group, click Advanced, to open the Advanced Filter dialog box. For Action, select Filter the list, in-place.

How do you filter a list of strings in Python?

filter() method is a very useful method of Python. One or more data values can be filtered from any string or list or dictionary in Python by using filter() method. It filters data based on any particular condition. It stores data when the condition returns true and discard data when returns false.


2 Answers

Improvement: Replace recursion with iteration

There isn't really "a composition function for an arbitrary list of functions"; however, it is pretty easy to build the filter chain with a simple for-loop:

def nFilter(filters, tuples):     for f in filters:         tuples = filter(f, tuples)     return tuples 

Improvement: Order filters by restrictiveness and speed

Chained iterators are so fast that the total running time will tend to be dominated by the calls to predicate functions.

The best outcome can be had by ordering the predicates to minimize the total work. In general, it is better to put cheap tests before expensive tests and to put more restrictive tests before tests that don't filter out many cases.

Example

In this example, the predicates have about the same cost (a function call, tuple indexing, and comparison to a constant), but they vary in restrictiveness (the t[2]==4 filters-out 80% of the cases while thet[0]>1 and t[1]<3 each only filter-out 40% of the data).

>>> from itertools import product  >>> filters = [lambda t: t[2]==4, lambda t: t[0]>1, lambda t: t[1]<3] >>> for tup in nFilter(filters, product(range(5), repeat=3)):         print(tup)  (2, 0, 4) (2, 1, 4) (2, 2, 4) (3, 0, 4) (3, 1, 4) (3, 2, 4) (4, 0, 4) (4, 1, 4) (4, 2, 4) 

Notes hoisted-up from the comments

  • The filter functions make zero applications of the predicate when the input iterable is empty. It is like doing a for-loop over an empty list.

  • Each filter reduces the amount of data fed into the enclosing filter. Accordingly, each filter gets only gets applied to data that has made it through the previous filters.

  • Don't worry about the lambda in the example. It makes the same function as a regular def. It is just a convenient way of writing the list of filters.

  • In Python 3, the filter() function was updated to return a iterator instead of a list. In Python 2, you can achieve the same effect using itertools.ifilter() instead filter().

like image 187
Raymond Hettinger Avatar answered Oct 17 '22 09:10

Raymond Hettinger


Are you looking for something like this?

filters = (f1,f2,f3,f4) filtered_list = filter( lambda x: all(f(x) for f in filters), your_list ) 

This has the advantage that as soon as a single filter returns False, that list element won't be included.

like image 31
mgilson Avatar answered Oct 17 '22 11:10

mgilson