Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pivoting pandas DataFrame -- AssertionError: Index length did not match values

Tags:

python

pandas

I have a pandas.DataFrame that won't pivot the way I expect. While pivot_table properly arranges everything, the fact that it uses aggregate functions to get there is off-putting. In addition, pivot_table seems to return an unnecessarily complex object rather than a flat data frame.

Consider the following example

import pandas as pd
df = pd.DataFrame({'firstname':['Jon']*3+['Amy']*2,
                   'lastname':['Cho']*3+['Frond']*2,
                   'vehicle':['bike', 'car', 'plane','bike','plane'],
                   'weight':[81.003]*3+[65.6886]*2,
                   'speed':[29.022, 95.1144, 302.952, 27.101, 344.2],})
df.set_index(['firstname','lastname','weight'])

print('------  Unnecessary pivot_table does averaging  ------')
print(pd.pivot_table(df, values='speed',
                         rows='firstname','lastname','weight'],
                         cols='vehicle'))

print('------ pivot method dies  ------')
print(df.pivot( index=['firstname','lastname','weight'],
                columns='vehicle',
                values='speed'))

The pivot_table results are

vehicle                       bike      car    plane
firstname lastname weight                           
Amy       Frond    65.6886  27.101      NaN  344.200
Jon       Cho      81.0030  29.022  95.1144  302.952

Is there a way to get pivot to give essentially the same output as the pivot_table command did (but hopefully flatter and neater)? Failing that, how do I flatten the output of pivot_table? What I want as output is something more like this:

firstname lastname weight     bike      car    plane                           
Amy       Frond    65.6886  27.101      NaN  344.200
Jon       Cho      81.0030  29.022  95.1144  302.952
like image 742
Brian B Avatar asked Mar 22 '23 11:03

Brian B


1 Answers

If you don't want the aggregation of pivot_table, you indeed need the pivot function. However, pivot does not work with providing multiple index-columns (actually I don't know why). But, there is a similar function to pivot, unstack, which works the same but based on the (multi) index instead of columns.

So to use this, you can first set the columns you want as index/column labels in the result as index:

df2 = df.set_index(['firstname','lastname','weight', 'vehicle'])

and then unstack on the last level (default), so on 'vehicle' (which become the column labels):

In [3]: df2.unstack()
Out[3]:
                             speed
vehicle                       bike      car    plane
firstname lastname weight
Amy       Frond    65.6886  27.101      NaN  344.200
Jon       Cho      81.0030  29.022  95.1144  302.952

And if you don't want the multi-index, you can 'flatten' the result with reset_index.
The only possible problem that you can have with this is that the columns also have two levels, so you can first remove the first level, and then reset the index to become a really flat dataframe:

In [17]: df3 = df2.unstack()

In [18]: df3.columns = df3.columns.droplevel(0)

In [19]: df3.reset_index()
Out[19]:
vehicle firstname lastname   weight    bike      car    plane
0             Amy    Frond  65.6886  27.101      NaN  344.200
1             Jon      Cho  81.0030  29.022  95.1144  302.952
like image 56
joris Avatar answered Apr 06 '23 16:04

joris