I have a data frame with a number of columns but with three I am interested in. These are name
, year
and goals_scored
. None of these columns are unique in that for example I have lines like the following:
Name Year Goals_scored
John Smith 2014 3
John Smith 2014 2
John Smith 2014 0
John Smith 2015 1
John Smith 2015 1
John Smith 2015 2
John Smith 2015 1
John Smith 2015 0
John Smith 2016 1
John Smith 2016 0
What I am trying to do is create a new data frame where I have 4 columns. One for name, then one for each of the years 2014, 2015 and 2016. The last three columns being a sum of the goals_scored for the year in question. So using the data above it would look like:
Name 2014 2015 2016
John Smith 5 5 1
To make it worse they only want it to include those names that have something for all three years.
Can anyone point me in the right direction?
Need groupby
, aggregate sum
and reshape by unstack
:
df = df.groupby(['Name','Year'])['Goals_scored'].sum().unstack()
print (df)
Year 2014 2015 2016
Name
John Smith 5 5 1
Alternative pivot_table
:
df = df.pivot_table(index='Name',columns='Year', values='Goals_scored', aggfunc='sum')
print (df)
Year 2014 2015 2016
Name
John Smith 5 5 1
Last for column from index:
df = df.reset_index().rename_axis(None, 1)
print (df)
Name 2014 2015 2016
0 John Smith 5 5 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With