Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to use a custom filter function in pandas?

Tags:

python

pandas

Can I use my helper function to determine if a shot was a three pointer as a filter function in Pandas? My actual function is much more complex, but i simplified it for this question.

def isThree(x, y):
    return (x + y == 3)

print data[isThree(data['x'], data['y'])].head()
like image 750
JSells Avatar asked Apr 01 '19 20:04

JSells


People also ask

Which filtering method is possible on Pandas?

Pandas DataFrame filter() Method The filter() method filters the DataFrame, and returns only the rows or columns that are specified in the filter.

How do I add a filter to Pandas?

Filter Rows by Condition You can use df[df["Courses"] == 'Spark'] to filter rows by a condition in pandas DataFrame. Not that this expression returns a new DataFrame with selected rows. You can also write the above statement with a variable.

Is conditional filtering method is possible on Pandas?

Pandas enable us to filter the DataFrame by selecting the rows based on one or more conditions. The filtering can be as simple as running a query based on a single condition and can also be a complex query that takes multiple conditions into consideration.

How do I filter a column in Pandas using Python?

Using query() to Filter by Column Value in pandas DataFrame. query() function is used to filter rows based on column value in pandas. After applying the expression, it returns a new DataFrame. If you wanted to update the existing DataFrame use inplace=True param.


2 Answers

Yes:

import numpy as np
import pandas as pd

data = pd.DataFrame({'x': np.random.randint(1,3,10),
                     'y': np.random.randint(1,3,10)})
print(data)

Output:

   x  y
0  1  2
1  2  1
2  2  1
3  1  2
4  2  1
5  2  1
6  2  1
7  2  1
8  2  1
9  2  2
def isThree(x, y):
    return (x + y == 3)

print(data[isThree(data['x'], data['y'])].head())

Output:

   x  y
0  1  2
1  2  1
2  2  1
3  1  2
4  2  1
like image 83
Nathaniel Avatar answered Nov 02 '22 14:11

Nathaniel


Yes, so long as your function returns a Boolean Series with the same index you can slice your original DataFrame with the output. In this simple example, we can pass Series to your function:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(0, 4, (30, 2)))
def isThree(x, y):
    return x + y == 3

df[isThree(df[0], df[1])]
#    0  1
#2   2  1
#5   2  1
#9   0  3
#11  2  1
#12  0  3
#13  2  1
#27  3  0
like image 38
ALollz Avatar answered Nov 02 '22 15:11

ALollz