Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Selecting rows from a Pandas dataframe with a compound (hierarchical) index

I'm suspicious that this is trivial, but I yet to discover the incantation that will let me select rows from a Pandas dataframe based on the values of a hierarchical key. So, for example, imagine we have the following dataframe:

import pandas df = pandas.DataFrame({'group1': ['a','a','a','b','b','b'],                        'group2': ['c','c','d','d','d','e'],                        'value1': [1.1,2,3,4,5,6],                        'value2': [7.1,8,9,10,11,12] }) df = df.set_index(['group1', 'group2']) 

df looks as we would expect:

enter image description here

If df were not indexed on group1 I could do the following:

df['group1' == 'a'] 

But that fails on this dataframe with an index. So maybe I should think of this like a Pandas series with a hierarchical index:

df['a','c'] 

Nope. That fails as well.

So how do I select out all the rows where:

  1. group1 == 'a'
  2. group1 == 'a' & group2 == 'c'
  3. group2 == 'c'
  4. group1 in ['a','b','c']
like image 453
JD Long Avatar asked Aug 13 '12 20:08

JD Long


People also ask

How do I create a hierarchical index in pandas?

To make the column an index, we use the Set_index() function of pandas. If we want to make one column an index, we can simply pass the name of the column as a string in set_index(). If we want to do multi-indexing or Hierarchical Indexing, we pass the list of column names in the set_index().


2 Answers

Try using xs to be very precise:

In [5]: df.xs('a', level=0) Out[5]:          value1  value2 group2                 c          1.1     7.1 c          2.0     8.0 d          3.0     9.0  In [6]: df.xs('c', level='group2') Out[6]:          value1  value2 group1                 a          1.1     7.1 a          2.0     8.0 
like image 183
Wes McKinney Avatar answered Sep 22 '22 09:09

Wes McKinney


Syntax like the following will work:

df.ix['a'] df.ix['a'].ix['c'] 

since group1 and group2 are indices. Please forgive my previous attempt!

To get at the second index only, I think you have to swap indices:

df.swaplevel(0,1).ix['c'] 

But I'm sure Wes will correct me if I'm wrong.

like image 22
Chris Fonnesbeck Avatar answered Sep 19 '22 09:09

Chris Fonnesbeck