Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas groupby() on one column and then sum on another

I have a data frame with a number of columns but with three I am interested in. These are name, year and goals_scored. None of these columns are unique in that for example I have lines like the following:

Name           Year     Goals_scored
John Smith     2014     3
John Smith     2014     2
John Smith     2014     0
John Smith     2015     1
John Smith     2015     1
John Smith     2015     2
John Smith     2015     1
John Smith     2015     0
John Smith     2016     1
John Smith     2016     0

What I am trying to do is create a new data frame where I have 4 columns. One for name, then one for each of the years 2014, 2015 and 2016. The last three columns being a sum of the goals_scored for the year in question. So using the data above it would look like:

Name          2014     2015     2016
John Smith    5        5        1

To make it worse they only want it to include those names that have something for all three years.

Can anyone point me in the right direction?

like image 575
SeagullWardy Avatar asked Oct 17 '17 11:10

SeagullWardy


1 Answers

Need groupby, aggregate sum and reshape by unstack:

df = df.groupby(['Name','Year'])['Goals_scored'].sum().unstack()
print (df)
Year        2014  2015  2016
Name                        
John Smith     5     5     1

Alternative pivot_table:

df = df.pivot_table(index='Name',columns='Year', values='Goals_scored', aggfunc='sum')
print (df)
Year        2014  2015  2016
Name                        
John Smith     5     5     1

Last for column from index:

df = df.reset_index().rename_axis(None, 1)
print (df)
         Name  2014  2015  2016
0  John Smith     5     5     1
like image 177
jezrael Avatar answered Sep 21 '22 23:09

jezrael