Pandas Dataframe Stacking versus Pivoting

Question

I'm using pandas to reshape some string/numeric-valued responses, and I've run into some behavior that's a bit counterintuitive.

Can someone explain the difference between the dataframes stacked and pivoted below, and why pivoted2 raises the DataError even though no aggfunc is passed?

import pandas as pd

d = {'ID': pd.Series(['x']*3 + ['y']*3,index = range(6)),
     'Count': pd.Series([1,2,1,1,1,1], index = range(6)),
     'Value_type': pd.Series(['foo','foo','bar','foo','bar','baz'], index = range(6)),
     'Value': pd.Series(range(1,7),index = range(6))}
df = pd.DataFrame(d)

d2 = {'ID': pd.Series(['x']*3 + ['y']*3,index = range(6)),
     'Count': pd.Series([1,2,1,1,1,1], index = range(6)),
     'Value_type': pd.Series(['foo','foo','bar','foo','bar','baz'], index = range(6)),
     'Value': pd.Series(list('abcdef'),index = range(6))}
df2 = pd.DataFrame(d2)

restacked = df.set_index(['ID','Count','Value_type']).unstack()
print restacked

restacked2 =  df2.set_index(['ID','Count','Value_type']).unstack()
print restacked2

pivoted = pd.pivot_table(df,rows = ['ID','Count'],cols = 'Value_type',values = 'Value')
print pivoted

## raises DataError('No numeric types to aggregate'), 
## even though no aggregation function is passed.
pivoted2 = pd.pivot_table(df2,rows = ['ID','Count'],cols = 'Value_type',values = 'Value')
print pivoted2

Andy Hayden · Accepted Answer

The default agg function is np.mean (even though you didn't pass it explicitly this is what is being used), which doesn't make sense on strings, in fact it raises an AttributeError when passed an object array - so pandas complains when you try to do this.

You could pass np.sum:

In [11]: pd.pivot_table(df2, rows=['ID', 'Count'], cols='Value_type',
                        values='Value', aggfunc=np.sum)
Out[11]: 
Value_type  bar  baz foo
ID Count                
x  1          c  NaN   a
   2        NaN  NaN   b
y  1          e    f   d

Or take the first item using iloc[0]:

In [12]: pd.pivot_table(df2, rows=['ID', 'Count'], cols='Value_type',
                        values='Value', aggfunc=lambda x: x.iloc[0])
Out[12]: 
Value_type  bar  baz foo
ID Count                
x  1          c  NaN   a
   2        NaN  NaN   b
y  1          e    f   d

Note: that this is the same as pivoted2['Value'], you can make this output the same as pivoted2 if you pass a list to values to aggregate:

In [13]: pd.pivot_table(df2, rows=['ID', 'Count'], cols=['Value_type'], 
                        values=['Value'], aggfunc=lambda x: x.iloc[0])
Out[13]: 
           Value         
Value_type   bar  baz foo
ID Count                 
x  1           c  NaN   a
   2         NaN  NaN   b
y  1           e    f   d

Pandas Dataframe Stacking versus Pivoting

Tags:

python

pandas

Michael K

1 Answers

Andy Hayden

Recent Activity

Donate For Us

Pandas Dataframe Stacking versus Pivoting

Tags:

python

pandas

Michael K

1 Answers

Andy Hayden

Related questions

Recent Activity

Donate For Us