I have a df with multi-indexed columns, like this:
col = pd.MultiIndex.from_arrays([['one', '', '', 'two', 'two', 'two'],
                                ['a', 'b', 'c', 'd', 'e', 'f']])
data = pd.DataFrame(np.random.randn(4, 6), columns=col)
data

I want to be able to select all rows where the values in one of the level 1 columns pass a certain test. If there were no multi-index on the columns I would say something like:
data[data['d']<1]
But of course that fails on a multindex. The level 1 indexes are unique, so I don't want to have to specify the level 0 index, just level 1. I'd like to return the table above but missing row 1, where d>1.
If values are unique in second level id necessary convert mask from one column DataFrame to Series - possible solution with DataFrame.squeeze:
np.random.seed(2019)
col = pd.MultiIndex.from_arrays([['one', '', '', 'two', 'two', 'two'],
                                ['a', 'b', 'c', 'd', 'e', 'f']])
data = pd.DataFrame(np.random.randn(4, 6), columns=col)
print (data.xs('d', axis=1, level=1))
        two
0  1.331864
1  0.953490
2 -0.189313
3  0.064969
print (data.xs('d', axis=1, level=1).squeeze())
0    1.331864
1    0.953490
2   -0.189313
3    0.064969
Name: two, dtype: float64
print (data.xs('d', axis=1, level=1).squeeze().lt(1))
0    False
1     True
2     True
3     True
Name: two, dtype: bool
df = data[data.xs('d', axis=1, level=1).squeeze().lt(1)]
Alternative with DataFrame.iloc:
df = data[data.xs('d', axis=1, level=1).iloc[:, 0].lt(1)]
print (df)
        one                           two                    
          a         b         c         d         e         f
1  0.573761  0.287728 -0.235634  0.953490 -1.689625 -0.344943
2  0.016905 -0.514984  0.244509 -0.189313  2.672172  0.464802
3  0.845930 -0.503542 -0.963336  0.064969 -3.205040  1.054969
If working with MultiIndex after select is possible get multiple columns, like here if select by c level:
np.random.seed(2019)
col = pd.MultiIndex.from_arrays([['one', '', '', 'two', 'two', 'two'],
                                ['a', 'b', 'c', 'a', 'b', 'c']])
data = pd.DataFrame(np.random.randn(4, 6), columns=col)
So first select by DataFrame.xs and compare by DataFrame.lt for <
print (data.xs('c', axis=1, level=1))
                  two
0  1.481278  0.685609
1 -0.235634 -0.344943
2  0.244509  0.464802
3 -0.963336  1.054969
m = data.xs('c', axis=1, level=1).lt(1)
#alternative
#m = data.xs('c', axis=1, level=1) < 1
print (m)
            two
0  False   True
1   True   True
2   True   True
3   True  False
And then test if at least one True per rows by DataFrame.any and filter by boolean indexing:
df1 = data[m.any(axis=1)]
print (df1)
        one                           two                    
          a         b         c         a         b         c
0 -0.217679  0.821455  1.481278  1.331864 -0.361865  0.685609
1  0.573761  0.287728 -0.235634  0.953490 -1.689625 -0.344943
2  0.016905 -0.514984  0.244509 -0.189313  2.672172  0.464802
3  0.845930 -0.503542 -0.963336  0.064969 -3.205040  1.054969
Or test if all Trues per row by DataFrame.any with filtering:
df1 = data[m.all(axis=1)]
print (df1)
        one                           two                    
          a         b         c         a         b         c
1  0.573761  0.287728 -0.235634  0.953490 -1.689625 -0.344943
2  0.016905 -0.514984  0.244509 -0.189313  2.672172  0.464802
                        Using your supplied data, a combination of xs and squeeze can help with the filtering. This works on the assumption that the level 1 entries are unique, as indicated in your question :
np.random.seed(2019)
col = pd.MultiIndex.from_arrays([['one', '', '', 'two', 'two', 'two'],
                                ['a', 'b', 'c', 'd', 'e', 'f']])
data = pd.DataFrame(np.random.randn(4, 6), columns=col)
data
       one                                         two
         a          b         c            d            e          f
0   -0.217679   0.821455    1.481278    1.331864    -0.361865   0.685609
1   0.573761    0.287728    -0.235634   0.953490    -1.689625   -0.344943
2   0.016905    -0.514984   0.244509    -0.189313   2.672172    0.464802
3   0.845930    -0.503542   -0.963336   0.064969    -3.205040   1.054969
Say you want to filter for d less than 1 :
#squeeze turns it into a series, making it easy to pass to loc via boolean indexing
condition = data.xs('d',axis=1,level=1).lt(1).squeeze()
#or you could use loc : 
# condition = data.loc(axis=1)[:,'d'].lt(1).squeeze()
data.loc[condition]
        one                                              two
       a             b           c         d            e           f
1   0.573761    0.287728    -0.235634   0.953490    -1.689625   -0.344943
2   0.016905    -0.514984   0.244509    -0.189313   2.672172    0.464802
3   0.845930    -0.503542   -0.963336   0.064969    -3.205040   1.054969
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With