Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas dataframe with MultiIndex: exclude level values

I have a multi-indexed pandas dataframe like the following one.

import numpy as np
import pandas as pd

arrays = [np.array(['bar', 'bar', 'bar', 'bar', 'foo', 'foo', 'qux', 'qux']),
          np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']),
          np.array(['blo', 'bla', 'bla', 'blo', 'blo', 'blu', 'blo', 'bla'])]

df = pd.DataFrame(np.random.randn(8, 4), index=arrays)

df.sort_index(inplace=True)

which returns:

                    0         1         2         3
bar one bla  0.478461  1.030308  0.012688  0.137495
        blo  0.476041 -1.679848  1.346798  0.143225
    two bla  1.148882 -2.074197 -2.567959  1.258016
        blo  1.062280  3.846096 -0.346636  1.170822
foo one blo -0.761327  0.262105  0.151554  1.066616
    two blu  1.431951  0.043307 -0.326498  2.402536
qux one blo -0.622017 -0.566930  0.417977 -0.345238
    two bla  0.129273 -0.181396 -0.758381  0.995827

Now I want to select a subset by using a slice object:

idx = pd.IndexSlice
subset = df.loc[idx[['bar'], :, :], :]

This returns:

                    0         1         2         3
bar one bla  0.478461  1.030308  0.012688  0.137495
        blo  0.476041 -1.679848  1.346798  0.143225
    two bla  1.148882 -2.074197 -2.567959  1.258016
        blo  1.062280  3.846096 -0.346636  1.170822

Now I want to exclude all rows having "blo" as level value. I know that I could select everything but the 'blo' values but my real dataframe is very big and I only know the level values which should not appear in the subset.

What's the easiest way to exclude certain level values from the subset?

Thanks in advance!

like image 867
Cord Kaldemeyer Avatar asked Jan 14 '16 11:01

Cord Kaldemeyer


2 Answers

IIUC, maybe you can mask your subset with:

subset = subset.iloc[subset.index.get_level_values(2) != 'blo']
like image 146
Fabio Lamanna Avatar answered Sep 19 '22 06:09

Fabio Lamanna


You can do it this way:

In [263]:
subset.loc[subset.index.get_level_values(2) != 'blo']

Out[263]:
                    0         1         2         3
bar one bla -1.039335 -1.124656  0.057114 -0.284754
    two bla  0.007208 -0.403559 -1.317075 -0.340171
like image 42
EdChum Avatar answered Sep 21 '22 06:09

EdChum