Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reduce multi-index/multi-level dataframe to single index, single level

Say I have a dataframe that looks like this:

>>> df
   Year   MPG VehicleType FuelType
0  2000  20.5         Car      Gas
1  2009  22.3         Car      Gas
2  2017  50.9         Car      Gas
3  2000  14.7         Car   Diesel
4  2009  18.0         Car   Diesel
5  2017  22.2         Car   Diesel

I need to split the VehicleType column into two columns based on the value of the FuelType column, using the Year column as an index. I used pivot_table to split the columns correctly.

>>> pd.pivot_table(df, columns=['VehicleType', 'FuelType'], values='MPG', index=['Year'])
VehicleType    Car      
FuelType    Diesel   Gas
Year                    
2000          14.7  20.5
2009          18.0  22.3
2017          22.2  50.9

This is great and all, but it results in a multi-indexed dataframe, which for my purposes I don't want.

I am trying to get a result that looks something like this:

Year Car_Diesel_MPG Car_Gas_MPG
2000           14.7        20.5
2009           18.0        22.3
2017           22.2        50.9

My efforts at trying to achieve this have resulted in some pretty nasty looking code. Is there a simple way to do this?

like image 334
darksinge Avatar asked Dec 01 '22 14:12

darksinge


2 Answers

Using set_axis, map, and join

df2.set_axis(df2.columns.map('_'.join), axis=1, inplace=False).add_suffix('_MPG')

      Car_Diesel_MPG  Car_Gas_MPG
Year                             
2000            14.7         20.5
2009            18.0         22.3
2017            22.2         50.9

groupby with a dict

m = {t: '_'.join(t) for t in df2.columns}
df2.groupby(m, axis=1).mean().add_suffix('_MPG')

      Car_Diesel_MPG  Car_Gas_MPG
Year                             
2000            14.7         20.5
2009            18.0         22.3
2017            22.2         50.9

Either of these can reset_index

m = {t: '_'.join(t) for t in df2.columns}
df2.groupby(m, axis=1).mean().add_suffix('_MPG').reset_index()

   Year  Car_Diesel_MPG  Car_Gas_MPG
0  2000            14.7         20.5
1  2009            18.0         22.3
2  2017            22.2         50.9

groupby instead of pivot_table

df.groupby(
    ['Year', df.VehicleType.str.cat(df.FuelType, sep='_').add('_MPG').values]
).MPG.sum().unstack().reset_index()

   Year  Car_Diesel_MPG  Car_Gas_MPG
0  2000            14.7         20.5
1  2009            18.0         22.3
2  2017            22.2         50.9
like image 51
piRSquared Avatar answered Dec 03 '22 05:12

piRSquared


You could just concatenate the levels of your multiindex using +, and use that as the columns of your pivot table:

# Same as your original privot table:
df2 = pd.pivot_table(df, columns=['VehicleType', 'FuelType'], values='MPG', index=['Year'])

df2.columns = df2.columns.get_level_values(0)+'_'+df2.columns.get_level_values(1)+'_MPG'

>>> df2
      Car_Diesel_MPG  Car_Gas_MPG
Year                             
2000            14.7         20.5
2009            18.0         22.3
2017            22.2         50.9
like image 34
sacuL Avatar answered Dec 03 '22 04:12

sacuL