Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas reset_index after groupby.value_counts()

I am trying to groupby a column and compute value counts on another column.

import pandas as pd dftest = pd.DataFrame({'A':[1,1,1,1,1,1,1,1,1,2,2,2,2,2],                 'Amt':[20,20,20,30,30,30,30,40, 40,10, 10, 40,40,40]})  print(dftest) 

dftest looks like

    A  Amt 0   1   20 1   1   20 2   1   20 3   1   30 4   1   30 5   1   30 6   1   30 7   1   40 8   1   40 9   2   10 10  2   10 11  2   40 12  2   40 13  2   40 

perform grouping

grouper = dftest.groupby('A') df_grouped = grouper['Amt'].value_counts() 

which gives

   A  Amt 1  30     4    20     3    40     2 2  40     3    10     2 Name: Amt, dtype: int64 

what I want is to keep top two rows of each group

Also, I was perplexed by an error when I tried to reset_index

df_grouped.reset_index() 

which gives following error

df_grouped.reset_index() ValueError: cannot insert Amt, already exists

like image 999
muon Avatar asked Sep 29 '16 19:09

muon


People also ask

How do you reindex after Groupby?

To reset index after group by, at first group according to a column using groupby(). After that, use reset_index().

What does reset_index () do in pandas?

Pandas DataFrame reset_index() Method The reset_index() method allows you reset the index back to the default 0, 1, 2 etc indexes. By default this method will keep the "old" idexes in a column named "index", to avoid this, use the drop parameter.

Does Groupby preserve index?

The Groupby Rolling function does not preserve the original index and so when dates are the same within the Group, it is impossible to know which index value it pertains to from the original dataframe.


1 Answers

You need parameter name in reset_index, because Series name is same as name of one of levels of MultiIndex:

df_grouped.reset_index(name='count') 

Another solution is rename Series name:

print (df_grouped.rename('count').reset_index())     A  Amt  count 0  1   30      4 1  1   20      3 2  1   40      2 3  2   40      3 4  2   10      2 

More common solution instead value_counts is aggregate size:

df_grouped1 =  dftest.groupby(['A','Amt']).size().reset_index(name='count')  print (df_grouped1)    A  Amt  count 0  1   20      3 1  1   30      4 2  1   40      2 3  2   10      2 4  2   40      3 
like image 86
jezrael Avatar answered Sep 22 '22 15:09

jezrael