I'm suspicious that this is trivial, but I yet to discover the incantation that will let me select rows from a Pandas dataframe based on the values of a hierarchical key. So, for example, imagine we have the following dataframe:
import pandas df = pandas.DataFrame({'group1': ['a','a','a','b','b','b'], 'group2': ['c','c','d','d','d','e'], 'value1': [1.1,2,3,4,5,6], 'value2': [7.1,8,9,10,11,12] }) df = df.set_index(['group1', 'group2'])
df looks as we would expect:
If df were not indexed on group1 I could do the following:
df['group1' == 'a']
But that fails on this dataframe with an index. So maybe I should think of this like a Pandas series with a hierarchical index:
df['a','c']
Nope. That fails as well.
So how do I select out all the rows where:
To make the column an index, we use the Set_index() function of pandas. If we want to make one column an index, we can simply pass the name of the column as a string in set_index(). If we want to do multi-indexing or Hierarchical Indexing, we pass the list of column names in the set_index().
Try using xs
to be very precise:
In [5]: df.xs('a', level=0) Out[5]: value1 value2 group2 c 1.1 7.1 c 2.0 8.0 d 3.0 9.0 In [6]: df.xs('c', level='group2') Out[6]: value1 value2 group1 a 1.1 7.1 a 2.0 8.0
Syntax like the following will work:
df.ix['a'] df.ix['a'].ix['c']
since group1
and group2
are indices. Please forgive my previous attempt!
To get at the second index only, I think you have to swap indices:
df.swaplevel(0,1).ix['c']
But I'm sure Wes will correct me if I'm wrong.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With