Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas select rows in a multi-level index dataframe

Tags:

python

pandas

I want to know why the functionality exists to access the rows of a multilevel indexed series by key, but it does not exist for dataframes.

For example, I have the following dataframe:

    index_1 index_2 num_1   num_2
0   a       c       1       2
1   a       c       4       3
2   a       c       3       4
3   a       d       2       3
4   b       d       3       1
5   b       d       2       3

I now perform a groupby operation as follows:

group_single_col = test.groupby(['index_1', 'index_2'])['num_1'].max()

I have no problems doing:

group_single_col[('a')]

or

group_single_col[('a', 'c')]

which is an amazing functionality, allowing me to slice based on values of the different index levels.

However, if I do a groupby and extract more than one column, as follows:

group_double_col = test.groupby(['index_1', 'index_2'])[['num_1', 'num_2]].max()

the object that is returned is a DataFrame and although the multilevel index is realized, operations like:

group_double_col[('a')] 

fail.

I understand that in the first case a series is returned, and in the second case a dataframe is returned, but I still thought that the functionality should work fine with a dataframe.

The follow up question is, what workaround is there in the case of a dataframe?

Currently I do:

grouped[(grouped.index.get_level_values('index_1')=='a')]

but I question if there is a more efficient method.

like image 515
lhay86 Avatar asked Nov 25 '25 20:11

lhay86


1 Answers

You can use xs:

print (group_double_col.xs('a', axis=0, level=0))
         num_1  num_2
index_2              
c            4      4
d            2      3

print (group_double_col.xs('a', axis=0, level=0, drop_level=False))
                 num_1  num_2
index_1 index_2              
a       c            4      4
        d            2      3
like image 154
jezrael Avatar answered Nov 27 '25 10:11

jezrael