Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas - Filter multi-index by condition on all values within index

I am trying to filter a dataframe with a multi-index similar to the following.

import numpy as np
import pandas as pd

data = pd.DataFrame(np.random.rand(8),
             index=[list('AABBCCDD'),
                    ['M', 'F']*4])
data['Count'] = [1,2,15,17,8,12,11,20]

I would like to select all rows where 'Count' for both 'M' and 'F' inside a given outer level index is greater than 10. So for the example dataframe, all 'B' and 'D' rows should be selected, but none of the other rows. The only way I can think to do this is by looping through the outer index, but as loops in pandas are almost never the best way to do things I'm thinking there must be a better solution.

like image 340
elz Avatar asked Dec 17 '22 23:12

elz


1 Answers

groupby index then we using filter + all to get the all count more than the thresh

data.groupby(level=0).filter(lambda x : x['Count'].gt(10).all())
Out[495]: 
            0  Count
B M  0.232856     15
  F  0.536026     17
D M  0.375064     11
  F  0.795447     20

Inspired by Jpp using isin

s=data.Count.min(level=0).gt(10)
data.loc[data.index.get_level_values(0).isin(s[s].index)]
like image 199
BENY Avatar answered May 19 '23 01:05

BENY