Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the best way to slice a multiindex dataframe using a list of partial index tuples?

I want to slice a data frame using a partially matching index, or list of tuples.

_ix = [('foo','a', 1), ('foo','a', 2), ('foo','b', 1), 
       ('foo','b', 2), ('foo','c', 1), ('foo','c', 2)]
df = pd.DataFrame(np.ones((6, 1)), index=pd.MultiIndex.from_tuples(_ix))
print(df)

           0
foo a 1  1.0
      2  1.0
    b 1  1.0
      2  1.0
    c 1  1.0
      2  1.0

Given a query index like:

q_ix = [('foo', 'a'), ('foo', 'c')]

I want to obtain

           0
foo a 1  1.0
      2  1.0
    c 1  1.0
      2  1.0

I can get this by using pd.concat and a list comprehension...

df_sliced = pd.concat([df.loc[(*x, slice(None)), :] for x in q_ix])

...but this is super clunky when my query index is large. Is there no better way?

like image 862
PeterFoster Avatar asked Dec 12 '25 14:12

PeterFoster


2 Answers

Use pd.IndexSlice from pandas

import pandas as pd

idx = pd.IndexSlice


df.loc[idx[:, ['a', 'c']], :] # Can use 'foo' instead of : on the first lvl

outputs

           0
foo a 1  1.0
      2  1.0
    c 1  1.0
      2  1.0

It reads take everything along the first level (:) then on the second level grab ["a", "c"]. Which we enclose in idx to mark that it is a slice. Finally, the last : tells that we want all the columns.

like image 106
Nathan Furnal Avatar answered Dec 15 '25 05:12

Nathan Furnal


Here is one way

df.reset_index(level=2).loc[q_ix].set_index('level_2',append=True)
                 0
      level_2     
foo a 1        1.0
      2        1.0
    c 1        1.0
      2        1.0
like image 20
BENY Avatar answered Dec 15 '25 05:12

BENY