Say I have a dataframe that looks like this:
>>> df
Year MPG VehicleType FuelType
0 2000 20.5 Car Gas
1 2009 22.3 Car Gas
2 2017 50.9 Car Gas
3 2000 14.7 Car Diesel
4 2009 18.0 Car Diesel
5 2017 22.2 Car Diesel
I need to split the VehicleType
column into two columns based on the value of the FuelType
column, using the Year
column as an index. I used pivot_table
to split the columns correctly.
>>> pd.pivot_table(df, columns=['VehicleType', 'FuelType'], values='MPG', index=['Year'])
VehicleType Car
FuelType Diesel Gas
Year
2000 14.7 20.5
2009 18.0 22.3
2017 22.2 50.9
This is great and all, but it results in a multi-indexed dataframe, which for my purposes I don't want.
I am trying to get a result that looks something like this:
Year Car_Diesel_MPG Car_Gas_MPG
2000 14.7 20.5
2009 18.0 22.3
2017 22.2 50.9
My efforts at trying to achieve this have resulted in some pretty nasty looking code. Is there a simple way to do this?
set_axis
, map
, and join
df2.set_axis(df2.columns.map('_'.join), axis=1, inplace=False).add_suffix('_MPG')
Car_Diesel_MPG Car_Gas_MPG
Year
2000 14.7 20.5
2009 18.0 22.3
2017 22.2 50.9
groupby
with a dict
m = {t: '_'.join(t) for t in df2.columns}
df2.groupby(m, axis=1).mean().add_suffix('_MPG')
Car_Diesel_MPG Car_Gas_MPG
Year
2000 14.7 20.5
2009 18.0 22.3
2017 22.2 50.9
Either of these can reset_index
m = {t: '_'.join(t) for t in df2.columns}
df2.groupby(m, axis=1).mean().add_suffix('_MPG').reset_index()
Year Car_Diesel_MPG Car_Gas_MPG
0 2000 14.7 20.5
1 2009 18.0 22.3
2 2017 22.2 50.9
groupby
instead of pivot_table
df.groupby(
['Year', df.VehicleType.str.cat(df.FuelType, sep='_').add('_MPG').values]
).MPG.sum().unstack().reset_index()
Year Car_Diesel_MPG Car_Gas_MPG
0 2000 14.7 20.5
1 2009 18.0 22.3
2 2017 22.2 50.9
You could just concatenate the levels of your multiindex using +
, and use that as the columns of your pivot table:
# Same as your original privot table:
df2 = pd.pivot_table(df, columns=['VehicleType', 'FuelType'], values='MPG', index=['Year'])
df2.columns = df2.columns.get_level_values(0)+'_'+df2.columns.get_level_values(1)+'_MPG'
>>> df2
Car_Diesel_MPG Car_Gas_MPG
Year
2000 14.7 20.5
2009 18.0 22.3
2017 22.2 50.9
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With