Pandas

Question

I am trying to filter a dataframe with a multi-index similar to the following.

import numpy as np
import pandas as pd

data = pd.DataFrame(np.random.rand(8),
             index=[list('AABBCCDD'),
                    ['M', 'F']*4])
data['Count'] = [1,2,15,17,8,12,11,20]

I would like to select all rows where 'Count' for both 'M' and 'F' inside a given outer level index is greater than 10. So for the example dataframe, all 'B' and 'D' rows should be selected, but none of the other rows. The only way I can think to do this is by looping through the outer index, but as loops in pandas are almost never the best way to do things I'm thinking there must be a better solution.

BENY · Accepted Answer

groupby index then we using filter + all to get the all count more than the thresh

data.groupby(level=0).filter(lambda x : x['Count'].gt(10).all())
Out[495]: 
            0  Count
B M  0.232856     15
  F  0.536026     17
D M  0.375064     11
  F  0.795447     20

Inspired by Jpp using isin

s=data.Count.min(level=0).gt(10)
data.loc[data.index.get_level_values(0).isin(s[s].index)]

Pandas - Filter multi-index by condition on all values within index

Tags:

python

dataframe

elz

1 Answers

BENY

Recent Activity

Donate For Us