I am trying to groupby a column and compute value counts on another column.
import pandas as pd dftest = pd.DataFrame({'A':[1,1,1,1,1,1,1,1,1,2,2,2,2,2], 'Amt':[20,20,20,30,30,30,30,40, 40,10, 10, 40,40,40]}) print(dftest)
dftest looks like
A Amt 0 1 20 1 1 20 2 1 20 3 1 30 4 1 30 5 1 30 6 1 30 7 1 40 8 1 40 9 2 10 10 2 10 11 2 40 12 2 40 13 2 40
perform grouping
grouper = dftest.groupby('A') df_grouped = grouper['Amt'].value_counts()
which gives
A Amt 1 30 4 20 3 40 2 2 40 3 10 2 Name: Amt, dtype: int64
what I want is to keep top two rows of each group
Also, I was perplexed by an error when I tried to reset_index
df_grouped.reset_index()
which gives following error
df_grouped.reset_index() ValueError: cannot insert Amt, already exists
To reset index after group by, at first group according to a column using groupby(). After that, use reset_index().
Pandas DataFrame reset_index() Method The reset_index() method allows you reset the index back to the default 0, 1, 2 etc indexes. By default this method will keep the "old" idexes in a column named "index", to avoid this, use the drop parameter.
The Groupby Rolling function does not preserve the original index and so when dates are the same within the Group, it is impossible to know which index value it pertains to from the original dataframe.
You need parameter name
in reset_index
, because Series
name is same as name of one of levels of MultiIndex
:
df_grouped.reset_index(name='count')
Another solution is rename
Series
name:
print (df_grouped.rename('count').reset_index()) A Amt count 0 1 30 4 1 1 20 3 2 1 40 2 3 2 40 3 4 2 10 2
More common solution instead value_counts
is aggregate size
:
df_grouped1 = dftest.groupby(['A','Amt']).size().reset_index(name='count') print (df_grouped1) A Amt count 0 1 20 3 1 1 30 4 2 1 40 2 3 2 10 2 4 2 40 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With