Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas - GroupBy 2 Columns - Unable to reset index

I have a DF as follows:

Date Bought | Fruit
2018-01       Apple
2018-02       Orange
2018-02       Orange
2018-02       Lemon

I wish to group the data by 'Date Bought' & 'Fruit' and count the occurrences.

Expected result:

Date Bought | Fruit | Count
2018-01       Apple     1
2018-02       Orange    2
2018-02       Lemon     1

What I get:

Date Bought | Fruit | Count
2018-01       Apple     1
2018-02       Orange    2
              Lemon     1

Code used:

Initial attempt:
df.groupby(['Date Bought','Fruit'])['Fruit'].agg('count')

#2
df.groupby(['Date Bought','Fruit'])['Fruit'].agg('count').reset_index()
ERROR: Cannot insert Fruit, already exists

#3
df.groupby(['Date Bought','Fruit'])['Fruit'].agg('count').reset_index(inplace=True)
ERROR: Type Error: Cannot reset_index inplace on a Series to create a DataFrame

Documentation shows that the groupby function returns a 'groupby object' not a standard DF. How can I group the data as mentioned above and retain the DF format?

like image 490
SheerKahn Avatar asked Mar 02 '23 16:03

SheerKahn


1 Answers

The problem here is that by resetting the index you'd end up with 2 columns with the same name. Because working with Series is possible set parameter name in Series.reset_index:

df1 = (df.groupby(['Date Bought','Fruit'], sort=False)['Fruit']
         .agg('count')
         .reset_index(name='Count'))
print (df1)
  Date Bought   Fruit  Count
0     2018-01   Apple      1
1     2018-02  Orange      2
2     2018-02   Lemon      1
like image 96
jezrael Avatar answered Mar 05 '23 17:03

jezrael