Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert one column into multiple rows using python

Tags:

python

pandas

I have a dataset 'df' with 3 columns.

>> Original Data

    Student Id    Name  Marks
0       id_1    John    112
1       id_2    Rafs    181
2       id_2    Rafs    182
3       id_2    Rafs    183
4       id_3    Juan    222
5       id_3    Juan    312
6       id_3  Roller     21

Trying to keep the columns 'Student_Id' and 'Name' as it is but convert 'Marks' to multiple columns. Such that each unique 'Student_Id' and 'Name' will have a row of all possible Marks. Also we need not create columns manually but it should be dynamically created depending on the values.

>> Expected Output

    Student Id    Name  Marks1  Marks2  Marks3
0       id_1    John     112    <NA>    <NA>
1       id_2    Rafs     181     182     183
2       id_3    Juan     222     312    <NA>
3       id_3  Roller      21    <NA>    <NA>

Sample data to replicate the input

import pandas as pd

data = [
    ["id_1", 'John', 112],
    ["id_2", 'Rafs', 181],
    ["id_2", 'Rafs', 182],
    ["id_2", 'Rafs', 183], 
    ["id_3", 'Juan', 222],
    ["id_3", 'Juan', 312],
    ["id_3", 'Roller', 21]
]
df = pd.DataFrame(data, columns = ['Student Id', 'Name', 'Marks'])

I tried the below but I am not getting the desired output. It gives results in brackets() an also the Marks is missing.

df3 = df.pivot_table(index=['Student Id','Name'], columns='Marks', aggfunc = 'max')

>>Output
Empty DataFrame
Columns: []
Index: [(id_1, John), (id_2, Rafs), (id_3, Juan), (id_3, Roller)]
like image 208
Deb Avatar asked Dec 30 '22 14:12

Deb


2 Answers

Use GroupBy.cumcount for new column for counter columns created MultiIndex in df3:

df['g'] = df.groupby(['Student Id','Name']).cumcount().add(1)

df3 = (df.pivot_table(index=['Student Id','Name'], 
                     columns='g', 
                     values='Marks', 
                     aggfunc = 'max')
        .add_prefix('Marks')
        .rename_axis(None, axis=1)
        .reset_index())
print (df3)
  Student Id    Name  Marks1  Marks2  Marks3
0       id_1    John   112.0     NaN     NaN
1       id_2    Rafs   181.0   182.0   183.0
2       id_3    Juan   222.0   312.0     NaN
3       id_3  Roller    21.0     NaN     NaN

If need integers with missing values:

df['g'] = df.groupby(['Student Id','Name']).cumcount().add(1)

df3 = (df.pivot_table(index=['Student Id','Name'], 
                     columns='g', 
                     values='Marks', 
                     aggfunc = 'max')
        .add_prefix('Marks')
        .astype('Int64')
        .rename_axis(None, axis=1)
        .reset_index())
print (df3)
  Student Id    Name  Marks1  Marks2  Marks3
0       id_1    John     112    <NA>    <NA>
1       id_2    Rafs     181     182     183
2       id_3    Juan     222     312    <NA>
3       id_3  Roller      21    <NA>    <NA>
like image 200
jezrael Avatar answered Jan 02 '23 03:01

jezrael


another way:

temp = df.groupby(["Student Id", "Name"]).Marks.agg(list)

out = (pd.DataFrame(temp.tolist(), index=temp.index)
           .rename(columns=lambda x: f"Marks{x+1}")
           .reset_index())

temp will be a dataframe with aggregated lists of Marks per (id, name). Then we form a dataframe out of it while renaming the columns to desired format and resetting the index to put id & name to columns back.

to get

  Student Id    Name  Marks1  Marks2  Marks3
0       id_1    John     112     NaN     NaN
1       id_2    Rafs     181   182.0   183.0
2       id_3    Juan     222   312.0     NaN
3       id_3  Roller      21     NaN     NaN
like image 45
Mustafa Aydın Avatar answered Jan 02 '23 02:01

Mustafa Aydın