I am trying to filter a dataframe with a multi-index similar to the following.
import numpy as np
import pandas as pd
data = pd.DataFrame(np.random.rand(8),
index=[list('AABBCCDD'),
['M', 'F']*4])
data['Count'] = [1,2,15,17,8,12,11,20]
I would like to select all rows where 'Count' for both 'M' and 'F' inside a given outer level index is greater than 10. So for the example dataframe, all 'B' and 'D' rows should be selected, but none of the other rows. The only way I can think to do this is by looping through the outer index, but as loops in pandas are almost never the best way to do things I'm thinking there must be a better solution.
groupby
index then we using filter
+ all
to get the all count more than the thresh
data.groupby(level=0).filter(lambda x : x['Count'].gt(10).all())
Out[495]:
0 Count
B M 0.232856 15
F 0.536026 17
D M 0.375064 11
F 0.795447 20
Inspired by Jpp using isin
s=data.Count.min(level=0).gt(10)
data.loc[data.index.get_level_values(0).isin(s[s].index)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With