I want to know why the functionality exists to access the rows of a multilevel indexed series by key, but it does not exist for dataframes.
For example, I have the following dataframe:
index_1 index_2 num_1 num_2
0 a c 1 2
1 a c 4 3
2 a c 3 4
3 a d 2 3
4 b d 3 1
5 b d 2 3
I now perform a groupby operation as follows:
group_single_col = test.groupby(['index_1', 'index_2'])['num_1'].max()
I have no problems doing:
group_single_col[('a')]
or
group_single_col[('a', 'c')]
which is an amazing functionality, allowing me to slice based on values of the different index levels.
However, if I do a groupby and extract more than one column, as follows:
group_double_col = test.groupby(['index_1', 'index_2'])[['num_1', 'num_2]].max()
the object that is returned is a DataFrame and although the multilevel index is realized, operations like:
group_double_col[('a')]
fail.
I understand that in the first case a series is returned, and in the second case a dataframe is returned, but I still thought that the functionality should work fine with a dataframe.
The follow up question is, what workaround is there in the case of a dataframe?
Currently I do:
grouped[(grouped.index.get_level_values('index_1')=='a')]
but I question if there is a more efficient method.
You can use xs:
print (group_double_col.xs('a', axis=0, level=0))
num_1 num_2
index_2
c 4 4
d 2 3
print (group_double_col.xs('a', axis=0, level=0, drop_level=False))
num_1 num_2
index_1 index_2
a c 4 4
d 2 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With