Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Slicing a MultiIndex DataFrame by multiple values from a specified level

I want to slice a MultiIndex DataFrame by multiple values from a secondary level. For example, in the following DataFrame:

                val1  val2
ind1 ind2 ind3            
1    6    s1      10     8
2    7    s1      20     6
3    8    s2      30     4
4    9    s2      50     2
5    10   s3      60     0

I wish to slice only the rows in which ind3 == s1 or ind3 == s3:

           val1  val2
ind1 ind2            
1    6       10     8
2    7       20     6
5    10      60     0

Best hypothetical option would be to pass multiple arguments to .xs, since it is possible to explicitly state the desired level.

I could obviously concat all the sliced-by-single-value DataFrames:

In[2]: pd.concat([df.xs('s1',level=2), df.xs('s3',level=2)])
Out[2]:
           val1  val2
ind1 ind2            
1    6       10     8
2    7       20     6
5    10      60     0

But (a) it's tedious and not so readable when using more than 2 values, and (b) for large DataFrames it's quite heavy (or at least heavier than a multi-value slicing option, if that exists).

Here's the code to build the example DataFrame:

import pandas as pd
df = pd.DataFrame({'ind1':[1,2,3,4,5], 'ind2':[6,7,8,9,10], 'ind3':['s1','s1','s2','s2','s3'], 'val1':[10,20,30,50,60], 'val2':[8,6,4,2,0]}).set_index(['ind1','ind2','ind3'])
like image 232
danielhadar Avatar asked Aug 04 '16 17:08

danielhadar


People also ask

How do you slice in MultiIndex?

You can slice a MultiIndex by providing multiple indexers. You can provide any of the selectors as if you are indexing by label, see Selection by Label, including slices, lists of labels, labels, and boolean indexers. You can use slice(None) to select all the contents of that level.

Can you slice a DataFrame?

Slicing a DataFrame in Pandas includes the following steps:Ensure Python is installed (or install ActivePython) Import a dataset. Create a DataFrame. Slice the DataFrame.


1 Answers

You can use an IndexSlice:

idx = pd.IndexSlice
result = df.loc[idx[:, :, ['s1', 's3']], idx[:]]
result.index = result.index.droplevel('ind3')
print(result)

Output:

           val1  val2
ind1 ind2            
1    6       10     8
2    7       20     6
5    10      60     0

The second line above can also be written as

result = df.loc(axis=0)[idx[:, :, ['s1', 's3']]]
like image 111
A. Garcia-Raboso Avatar answered Oct 10 '22 10:10

A. Garcia-Raboso