Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas DataFrame - How to retrieve specific combinations of MultiIndex levels

I have the following DataFrame that uses a three-level MultiIndex:

In [1]: iterables = [[1, 2], ['foo', 'bar'], ['one', 'two']]
   ...: midx = pd.MultiIndex.from_product(iterables)
   ...: df = pd.DataFrame(np.random.randn(8), index=midx)
   ...: df

Out[1]:
                  0
1 foo one -0.217594
      two -1.361612
  bar one  2.477790
      two  0.874409
2 foo one  0.403577
      two  0.076111
  bar one  1.423512
      two  0.047898

I'd like to slice the index such that I keep all of the first level while only keeping the following combinations of the second two levels: ('foo', 'one') and ('bar', 'two'). That is, I'd like my output to look something like this:

                  0
1 foo one -0.217594
  bar two  0.874409
2 foo one  0.403577
  bar two  0.047898

Is it possible to do this in one line, using an attribute such as .loc, for example?


I know that I can take cross-sections of the desired combinations separately using the .xs function, but I'd prefer a shorter, more slice-like syntax. Specifically, having a one-liner is important for my use case.

It seems like the following should work:

df.loc[[(slice(None), 'foo', 'one'), (slice(None), 'bar', 'two')]]

But this results in a TypeError: unhashable type: 'slice'.

like image 905
Wesley Smith Avatar asked Oct 14 '18 00:10

Wesley Smith


People also ask

How do you get unique two column combinations in Pandas?

2. pandas Get Unique Values in Column. Unique is also referred to as distinct, you can get unique values in the column using pandas Series. unique() function, since this function needs to call on the Series object, use df['column_name'] to get the unique values as a Series.


2 Answers

You can construct a Boolean mask by first dropping the first index level and then using pd.Index.isin with a list of tuples:

df_masked = df[df.index.droplevel(0).isin([('foo', 'one'), ('bar', 'two')])]

print(df_masked)

                  0
1 foo one  1.510316
  bar two  0.260862
2 foo one  0.813745
  bar two  0.023386
like image 198
jpp Avatar answered Oct 16 '22 11:10

jpp


Here's a solution based on DataFrame.query. Your MultiIndex has unnamed levels, but level k can be accessed with the special name ilevel_k, according to the docs: "If the levels of the MultiIndex are unnamed, you can refer to them using special names."

query_string = ('(ilevel_1 == "foo" & ilevel_2 == "one") | '
                '(ilevel_1 == "bar" & ilevel_2 == "two")')

df.query(query_string)
                  0
1 foo one -0.217594
  bar two  0.874409
2 foo one  0.403577
  bar two  0.047898
like image 34
Peter Leimbigler Avatar answered Oct 16 '22 12:10

Peter Leimbigler