Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas filter data frame rows by function

I want to filter a data frame by more complex function based on different values in the row.

Is there a possibility to filter DF rows by a boolean function like you can do it e.g. in ES6 filter function?

Extreme simplified example to illustrate the problem:

import pandas as pd  def filter_fn(row):     if row['Name'] == 'Alisa' and row['Age'] > 24:         return False      return row  d = {     'Name': ['Alisa', 'Bobby', 'jodha', 'jack', 'raghu', 'Cathrine',              'Alisa', 'Bobby', 'kumar', 'Alisa', 'Alex', 'Cathrine'],     'Age': [26, 24, 23, 22, 23, 24, 26, 24, 22, 23, 24, 24],      'Score': [85, 63, 55, 74, 31, 77, 85, 63, 42, 62, 89, 77]}  df = pd.DataFrame(d, columns=['Name', 'Age', 'Score'])  df = df.apply(filter_fn, axis=1, broadcast=True)  print(df) 

I found something using apply() bit this actually returns only False/True filled rows using a bool function, which is expected.

My workaround would be returning the row itself when the function result would be True and returning False if not. But this would require a additional filtering after that.

        Name    Age  Score 0      False  False  False 1      Bobby     24     63 2      jodha     23     55 3       jack     22     74 4      raghu     23     31 5   Cathrine     24     77 6      False  False  False 7      Bobby     24     63 8      kumar     22     42 9      Alisa     23     62 10      Alex     24     89 11  Cathrine     24     77 
like image 540
Karl Adler Avatar asked Jul 30 '18 08:07

Karl Adler


People also ask

How do I filter rows in Panda DataFrame?

Filter Rows by Condition You can use df[df["Courses"] == 'Spark'] to filter rows by a condition in pandas DataFrame. Not that this expression returns a new DataFrame with selected rows.

How do you filter a row in Python?

The syntax of filtering row by one condition is very simple — dataframe[condition]. In Python, the equal operator is ==, double equal sign. Another way of achieving the same result is using Pandas chaining operation.

What does DataFrame filter do?

filter() function is used to Subset rows or columns of dataframe according to labels in the specified index. Note that this routine does not filter a dataframe on its contents. The filter is applied to the labels of the index.


1 Answers

I think using functions here is unnecessary. It is better and mainly faster to use boolean indexing:

m = (df['Name'] == 'Alisa') & (df['Age'] > 24) print(m) 0      True 1     False 2     False 3     False 4     False 5     False 6      True 7     False 8     False 9     False 10    False 11    False dtype: bool  #invert mask by ~ df1 = df[~m] 

For more complicated filtering, you could use a function which must return a boolean value:

def filter_fn(row):     if row['Name'] == 'Alisa' and row['Age'] > 24:         return False     else:         return True  df = pd.DataFrame(d, columns=['Name', 'Age', 'Score']) m = df.apply(filter_fn, axis=1) print(m) 0     False 1      True 2      True 3      True 4      True 5      True 6     False 7      True 8      True 9      True 10     True 11     True dtype: bool  df1 = df[m] 
like image 187
jezrael Avatar answered Oct 06 '22 17:10

jezrael