I am trying to extract grouped row data from a pandas groupby object so that the primary group data ('course' in the example below) act as a row index, the secondary grouped row values act as column headers ('student') and the aggregate values as the corresponding row data ('score').
So, for example, I would like to transform:
import pandas as pd
import numpy as np
data = {'course_id':[101,101,101,101,102,102,102,102] ,
'student_id':[1,1,2,2,1,1,2,2],
'score':[80,85,70,60,90,65,95,80]}
df = pd.DataFrame(data, columns=['course_id', 'student_id','score'])
Which I have grouped by course_id and student_id:
group = df.groupby(['course_id', 'student_id']).aggregate(np.mean)
g = pd.DataFrame(group)
Into something like this:
data = {'course':[101,102],'1':[82.5,77.5],'2':[65.0,87.5]}
g3 = pd.DataFrame(data, columns=['course', '1', '2'])
I have spent some time looking through the groupby documentation and I have trawled stack overflow and the like but I'm still not sure how to approach the problem. I would be very grateful if anyone would suggest a sensible way of achieving this for a largish dataset.
Many thanks!
columns() to Convert Row to Column Header. You can use df. columns=df. iloc[0] to set the column labels by extracting the first row.
Return a copy of the array collapsed into one dimension. Whether to flatten in C (row-major), Fortran (column-major) order, or preserve the C/Fortran ordering from a . The default is 'C'.
>>> g.reset_index().pivot('course_id', 'student_id', 'score')
student_id 1 2
course_id
101 82.5 65.0
102 77.5 87.5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With