Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas multiindex dataframe - Selecting max from one index within multiindex

I've got a dataframe with a multi index of Year and Month like the following

     |     |Value
Year |Month|  
     |  1  |  3
1992 |  2  |  5
     |  3  |  8
     | ... | ...
1993 |  1  |  2
     | ... | ...

I'm trying to select the maximum Value for each year and put that in a DF like this:

     | Max
Year |  
1992 |  5
1993 |  2
     | ... 

There's not much info on multi-indexes, should I simply do a group by and apply or something similar to make it more simple?

like image 637
edumike Avatar asked Apr 02 '17 00:04

edumike


People also ask

How do you select a specific index in a DataFrame?

iloc selects rows based on an integer index. So, if you want to select the 5th row in a DataFrame, you would use df. iloc[[4]] since the first row is at index 0, the second row is at index 1, and so on.

How do you slice a MultiIndex panda?

You can slice a MultiIndex by providing multiple indexers. You can provide any of the selectors as if you are indexing by label, see Selection by Label, including slices, lists of labels, labels, and boolean indexers. You can use slice(None) to select all the contents of that level.

How do you find the index of a max value in a DataFrame?

Pandas DataFrame idxmax() Method The idxmax() method returns a Series with the index of the maximum value for each column. By specifying the column axis ( axis='columns' ), the idxmax() method returns a Series with the index of the maximum value for each row.

How does pandas handle multiple index columns?

A multi-index dataframe has multi-level, or hierarchical indexing. We can easily convert the multi-level index into the column by the reset_index() method. DataFrame. reset_index() is used to reset the index to default and make the index a column of the dataframe.


Video Answer


2 Answers

Exactly right:

df.groupby(level=0).apply(max)

In my sample DataFrame:

                     0
Caps Lower            
A    a     0  0.246490
           1 -1.265711
           2 -0.477415
           3 -0.355812
           4 -0.724521
     b     0 -0.409198
           1 -0.062552
           2 -0.731789
           3  1.131616
           4  0.085248
B    a     0  0.193948
           1  2.010710
           2  0.289300
           3  0.305373
           4  1.376965
     b     0  0.210522
           1  1.431279
           2 -0.247171
           3  0.899074
           4  0.639926

Result:

             0
Caps          
A     1.131616
B     2.010710

This is how I created the DataFrame, by the way:

df = pd.DataFrame(np.random.randn(5,4), columns = l)
df.columns = pd.MultiIndex.from_tuples(df.columns, names=['Caps','Lower'])
df = pd.DataFrame(df.unstack())
like image 71
mechanical_meat Avatar answered Oct 22 '22 11:10

mechanical_meat


Simplier solution is max only:

#bernie's sample data
df = df.max(level=0)
print (df)
             0
Caps          
A     1.131616
B     2.010710
like image 41
jezrael Avatar answered Oct 22 '22 09:10

jezrael