Consider a dataframe like the following.
import pandas as pd
# Initialize dataframe
df1 = pd.DataFrame(columns=['bar', 'foo'])
df1['bar'] = ['001', '001', '001', '001', '002', '002', '003', '003', '003']
df1['foo'] = [-1, 0, 2, 3, -8, 1, 0, 1, 2]
>>> print df1
bar foo
0 001 -1
1 001 0
2 001 2
3 001 3
4 002 -8
5 002 1
6 003 0
7 003 1
8 003 2
# Lower and upper bound for desired range
lower_bound = -5
upper_bound = 5
I would like to use groupby in Pandas to return a dataframe that filters out rows with an bar that meets a condition. In particular, I would like to filter out rows with an bar if one of the values of foo for this bar is not between lower_bound and upper_bound.
In the above example, rows with bar = 002 should be filtered out since not all of the rows with bar = 002 contain a value of foo between -5 and 5 (namely, row index 4 contains foo = -8). The desired output for this example is the following.
# Desired output
bar foo
0 001 -1
1 001 0
2 001 2
3 001 3
6 003 0
7 003 1
8 003 2
I have tried the following approach.
# Attempted solution
grouped = df1.groupby('bar')['foo']
grouped.filter(lambda x: x < lower_bound or x > upper_bound)
However, this yields a TypeError: the filter must return a boolean result. Furthermore, this approach might return a groupby object, when I want the result to return a dataframe object.
Most likely you will not use and and or but vectorized & and | with pandas, and for your case, then apply all() function in the filter to construct the boolean condition, this keeps bar where all corresponding foo values are between lower_bound and upper_bound:
df1.groupby('bar').filter(lambda x: ((x.foo >= lower_bound) & (x.foo <= upper_bound)).all())
# bar foo
#0 001 -1
#1 001 0
#2 001 2
#3 001 3
#6 003 0
#7 003 1
#8 003 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With