Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove the multiindex from GroupBy.apply()?

Based off this question.

df = pandas.DataFrame([[2001, "Jack", 77], [2005, "Jack", 44], [2001, "Jill", 93]],columns=['Year','Name','Value'])

    Year    Name    Value
0   2001    Jack    77
1   2005    Jack    44
2   2001    Jill    93

For each unique Name, I would like to keep the row with the largest Year value. In the above example I would like to get the table

    Year    Name    Value
0   2005    Jack    44
1   2001    Jill    93

I tried solving this question with groupby + (apply):

df.groupby('Name', as_index=False)\
     .apply(lambda x: x.sort_values('Value').head(1))
     Year  Name  Value
0 0  2001  Jack     44
1 2  2001  Jill     93

Not the best approach, but I'm more interested in what is happening, and why. The result has a MultiIndex that looks like this:

MultiIndex(levels=[[0, 1], [0, 2]],
           labels=[[0, 1], [0, 1]])

I'm not looking for a workaround. I'm actually more interested to know why this happens, and how I can prevent it without changing my approach.

like image 517
cs95 Avatar asked Oct 11 '17 01:10

cs95


1 Answers

IIUC, use group_keys=False:

df.groupby('Name', group_keys=False).apply(lambda x: x.sort_values('Value').head(1))

Output:

   Year  Name  Value
1  2005  Jack     44
2  2001  Jill     93
like image 145
Scott Boston Avatar answered Oct 22 '22 09:10

Scott Boston