I have a DF with multiple columns which I want to convert from rows to columns most solutions I have seen on stack overflow only deal with 2 columns
From DF
PO ID PO Name Region Date Price
1 AA North 07/2016 100
2 BB South 07/2016 200
1 AA North 08/2016 300
2 BB South 08/2016 400
1 AA North 09/2016 500
To DF
PO ID PO Name Region 07/2016 08/2016 09/2016
1 AA North 100 300 500
2 BB South 200 400 NaN
Use set_index
with unstack
:
df = df.set_index(['PO ID','PO Name','Region', 'Date'])['Price'].unstack()
print (df)
Date 07/2016 08/2016 09/2016
PO ID PO Name Region
1 AA North 100.0 300.0 500.0
2 BB South 200.0 400.0 NaN
If duplicates need aggregate function with pivot_table
or groupby
:
print (df)
PO ID PO Name Region Date Price
0 1 AA North 07/2016 100 <-for PO ID;PO Name;Region;Date different Price
1 1 AA North 07/2016 500 <-for PO ID;PO Name;Region;Date different Price
2 2 BB South 07/2016 200
3 1 AA North 08/2016 300
4 2 BB South 08/2016 400
5 1 AA North 09/2016 500
df = df.pivot_table(index=['PO ID','PO Name','Region'],
columns='Date',
values='Price',
aggfunc='mean')
print (df)
Date 07/2016 08/2016 09/2016
PO ID PO Name Region
1 AA North 300.0 300.0 500.0 <-(100+500)/2=300 for 07/2016
2 BB South 200.0 400.0 NaN
df = df.groupby(['PO ID','PO Name','Region', 'Date'])['Price'].mean().unstack()
print (df)
Date 07/2016 08/2016 09/2016
PO ID PO Name Region
1 AA North 300.0 300.0 500.0 <-(100+500)/2=300 for 07/2016
2 BB South 200.0 400.0 NaN
Last:
df = df.reset_index().rename_axis(None).rename_axis(None, axis=1)
print (df)
PO ID PO Name Region 07/2016 08/2016 09/2016
0 1 AA North 300.0 300.0 500.0
1 2 BB South 200.0 400.0 NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With