I want to slice a MultiIndex DataFrame by multiple values from a secondary level. For example, in the following DataFrame:
val1 val2
ind1 ind2 ind3
1 6 s1 10 8
2 7 s1 20 6
3 8 s2 30 4
4 9 s2 50 2
5 10 s3 60 0
I wish to slice only the rows in which ind3 == s1
or ind3 == s3
:
val1 val2
ind1 ind2
1 6 10 8
2 7 20 6
5 10 60 0
Best hypothetical option would be to pass multiple arguments to .xs
, since it is possible to explicitly state the desired level
.
I could obviously concat all the sliced-by-single-value DataFrames:
In[2]: pd.concat([df.xs('s1',level=2), df.xs('s3',level=2)])
Out[2]:
val1 val2
ind1 ind2
1 6 10 8
2 7 20 6
5 10 60 0
But (a) it's tedious and not so readable when using more than 2 values, and (b) for large DataFrames it's quite heavy (or at least heavier than a multi-value slicing option, if that exists).
Here's the code to build the example DataFrame:
import pandas as pd
df = pd.DataFrame({'ind1':[1,2,3,4,5], 'ind2':[6,7,8,9,10], 'ind3':['s1','s1','s2','s2','s3'], 'val1':[10,20,30,50,60], 'val2':[8,6,4,2,0]}).set_index(['ind1','ind2','ind3'])
You can slice a MultiIndex by providing multiple indexers. You can provide any of the selectors as if you are indexing by label, see Selection by Label, including slices, lists of labels, labels, and boolean indexers. You can use slice(None) to select all the contents of that level.
Slicing a DataFrame in Pandas includes the following steps:Ensure Python is installed (or install ActivePython) Import a dataset. Create a DataFrame. Slice the DataFrame.
You can use an IndexSlice
:
idx = pd.IndexSlice
result = df.loc[idx[:, :, ['s1', 's3']], idx[:]]
result.index = result.index.droplevel('ind3')
print(result)
Output:
val1 val2
ind1 ind2
1 6 10 8
2 7 20 6
5 10 60 0
The second line above can also be written as
result = df.loc(axis=0)[idx[:, :, ['s1', 's3']]]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With