I want to filter a data frame by more complex function based on different values in the row.
Is there a possibility to filter DF rows by a boolean function like you can do it e.g. in ES6 filter function?
Extreme simplified example to illustrate the problem:
import pandas as pd def filter_fn(row): if row['Name'] == 'Alisa' and row['Age'] > 24: return False return row d = { 'Name': ['Alisa', 'Bobby', 'jodha', 'jack', 'raghu', 'Cathrine', 'Alisa', 'Bobby', 'kumar', 'Alisa', 'Alex', 'Cathrine'], 'Age': [26, 24, 23, 22, 23, 24, 26, 24, 22, 23, 24, 24], 'Score': [85, 63, 55, 74, 31, 77, 85, 63, 42, 62, 89, 77]} df = pd.DataFrame(d, columns=['Name', 'Age', 'Score']) df = df.apply(filter_fn, axis=1, broadcast=True) print(df)
I found something using apply() bit this actually returns only False
/True
filled rows using a bool function, which is expected.
My workaround would be returning the row itself when the function result would be True and returning False if not. But this would require a additional filtering after that.
Name Age Score 0 False False False 1 Bobby 24 63 2 jodha 23 55 3 jack 22 74 4 raghu 23 31 5 Cathrine 24 77 6 False False False 7 Bobby 24 63 8 kumar 22 42 9 Alisa 23 62 10 Alex 24 89 11 Cathrine 24 77
Filter Rows by Condition You can use df[df["Courses"] == 'Spark'] to filter rows by a condition in pandas DataFrame. Not that this expression returns a new DataFrame with selected rows.
The syntax of filtering row by one condition is very simple — dataframe[condition]. In Python, the equal operator is ==, double equal sign. Another way of achieving the same result is using Pandas chaining operation.
filter() function is used to Subset rows or columns of dataframe according to labels in the specified index. Note that this routine does not filter a dataframe on its contents. The filter is applied to the labels of the index.
I think using functions here is unnecessary. It is better and mainly faster to use boolean indexing:
m = (df['Name'] == 'Alisa') & (df['Age'] > 24) print(m) 0 True 1 False 2 False 3 False 4 False 5 False 6 True 7 False 8 False 9 False 10 False 11 False dtype: bool #invert mask by ~ df1 = df[~m]
For more complicated filtering, you could use a function which must return a boolean value:
def filter_fn(row): if row['Name'] == 'Alisa' and row['Age'] > 24: return False else: return True df = pd.DataFrame(d, columns=['Name', 'Age', 'Score']) m = df.apply(filter_fn, axis=1) print(m) 0 False 1 True 2 True 3 True 4 True 5 True 6 False 7 True 8 True 9 True 10 True 11 True dtype: bool df1 = df[m]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With