I have a dataset 'df' with 3 columns.
>> Original Data
Student Id Name Marks
0 id_1 John 112
1 id_2 Rafs 181
2 id_2 Rafs 182
3 id_2 Rafs 183
4 id_3 Juan 222
5 id_3 Juan 312
6 id_3 Roller 21
Trying to keep the columns 'Student_Id' and 'Name' as it is but convert 'Marks' to multiple columns. Such that each unique 'Student_Id' and 'Name' will have a row of all possible Marks. Also we need not create columns manually but it should be dynamically created depending on the values.
>> Expected Output
Student Id Name Marks1 Marks2 Marks3
0 id_1 John 112 <NA> <NA>
1 id_2 Rafs 181 182 183
2 id_3 Juan 222 312 <NA>
3 id_3 Roller 21 <NA> <NA>
Sample data to replicate the input
import pandas as pd
data = [
["id_1", 'John', 112],
["id_2", 'Rafs', 181],
["id_2", 'Rafs', 182],
["id_2", 'Rafs', 183],
["id_3", 'Juan', 222],
["id_3", 'Juan', 312],
["id_3", 'Roller', 21]
]
df = pd.DataFrame(data, columns = ['Student Id', 'Name', 'Marks'])
I tried the below but I am not getting the desired output. It gives results in brackets() an also the Marks is missing.
df3 = df.pivot_table(index=['Student Id','Name'], columns='Marks', aggfunc = 'max')
>>Output
Empty DataFrame
Columns: []
Index: [(id_1, John), (id_2, Rafs), (id_3, Juan), (id_3, Roller)]
Use GroupBy.cumcount
for new column for counter columns created MultiIndex
in df3
:
df['g'] = df.groupby(['Student Id','Name']).cumcount().add(1)
df3 = (df.pivot_table(index=['Student Id','Name'],
columns='g',
values='Marks',
aggfunc = 'max')
.add_prefix('Marks')
.rename_axis(None, axis=1)
.reset_index())
print (df3)
Student Id Name Marks1 Marks2 Marks3
0 id_1 John 112.0 NaN NaN
1 id_2 Rafs 181.0 182.0 183.0
2 id_3 Juan 222.0 312.0 NaN
3 id_3 Roller 21.0 NaN NaN
If need integers with missing values:
df['g'] = df.groupby(['Student Id','Name']).cumcount().add(1)
df3 = (df.pivot_table(index=['Student Id','Name'],
columns='g',
values='Marks',
aggfunc = 'max')
.add_prefix('Marks')
.astype('Int64')
.rename_axis(None, axis=1)
.reset_index())
print (df3)
Student Id Name Marks1 Marks2 Marks3
0 id_1 John 112 <NA> <NA>
1 id_2 Rafs 181 182 183
2 id_3 Juan 222 312 <NA>
3 id_3 Roller 21 <NA> <NA>
another way:
temp = df.groupby(["Student Id", "Name"]).Marks.agg(list)
out = (pd.DataFrame(temp.tolist(), index=temp.index)
.rename(columns=lambda x: f"Marks{x+1}")
.reset_index())
temp
will be a dataframe with aggregated lists of Marks
per (id, name)
. Then we form a dataframe out of it while renaming the columns to desired format and resetting the index to put id & name to columns back.
to get
Student Id Name Marks1 Marks2 Marks3
0 id_1 John 112 NaN NaN
1 id_2 Rafs 181 182.0 183.0
2 id_3 Juan 222 312.0 NaN
3 id_3 Roller 21 NaN NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With