I have a pandas.DataFrame that won't pivot the way I expect. While pivot_table
properly arranges everything, the fact that it uses aggregate functions to get there is off-putting. In addition, pivot_table
seems to return an unnecessarily complex object rather than a flat data frame.
Consider the following example
import pandas as pd
df = pd.DataFrame({'firstname':['Jon']*3+['Amy']*2,
'lastname':['Cho']*3+['Frond']*2,
'vehicle':['bike', 'car', 'plane','bike','plane'],
'weight':[81.003]*3+[65.6886]*2,
'speed':[29.022, 95.1144, 302.952, 27.101, 344.2],})
df.set_index(['firstname','lastname','weight'])
print('------ Unnecessary pivot_table does averaging ------')
print(pd.pivot_table(df, values='speed',
rows='firstname','lastname','weight'],
cols='vehicle'))
print('------ pivot method dies ------')
print(df.pivot( index=['firstname','lastname','weight'],
columns='vehicle',
values='speed'))
The pivot_table
results are
vehicle bike car plane
firstname lastname weight
Amy Frond 65.6886 27.101 NaN 344.200
Jon Cho 81.0030 29.022 95.1144 302.952
Is there a way to get pivot
to give essentially the same output as the pivot_table
command did (but hopefully flatter and neater)? Failing that, how do I flatten the output of pivot_table
? What I want as output is something more like this:
firstname lastname weight bike car plane
Amy Frond 65.6886 27.101 NaN 344.200
Jon Cho 81.0030 29.022 95.1144 302.952
If you don't want the aggregation of pivot_table
, you indeed need the pivot
function. However, pivot
does not work with providing multiple index-columns (actually I don't know why). But, there is a similar function to pivot, unstack
, which works the same but based on the (multi) index instead of columns.
So to use this, you can first set the columns you want as index/column labels in the result as index:
df2 = df.set_index(['firstname','lastname','weight', 'vehicle'])
and then unstack on the last level (default), so on 'vehicle' (which become the column labels):
In [3]: df2.unstack()
Out[3]:
speed
vehicle bike car plane
firstname lastname weight
Amy Frond 65.6886 27.101 NaN 344.200
Jon Cho 81.0030 29.022 95.1144 302.952
And if you don't want the multi-index, you can 'flatten' the result with reset_index
.
The only possible problem that you can have with this is that the columns also have two levels, so you can first remove the first level, and then reset the index to become a really flat dataframe:
In [17]: df3 = df2.unstack()
In [18]: df3.columns = df3.columns.droplevel(0)
In [19]: df3.reset_index()
Out[19]:
vehicle firstname lastname weight bike car plane
0 Amy Frond 65.6886 27.101 NaN 344.200
1 Jon Cho 81.0030 29.022 95.1144 302.952
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With