I'm using pandas to reshape some string/numeric-valued responses, and I've run into some behavior that's a bit counterintuitive.
Can someone explain the difference between the dataframes stacked and pivoted below, and why pivoted2 raises the DataError even though no aggfunc is passed?
import pandas as pd
d = {'ID': pd.Series(['x']*3 + ['y']*3,index = range(6)),
'Count': pd.Series([1,2,1,1,1,1], index = range(6)),
'Value_type': pd.Series(['foo','foo','bar','foo','bar','baz'], index = range(6)),
'Value': pd.Series(range(1,7),index = range(6))}
df = pd.DataFrame(d)
d2 = {'ID': pd.Series(['x']*3 + ['y']*3,index = range(6)),
'Count': pd.Series([1,2,1,1,1,1], index = range(6)),
'Value_type': pd.Series(['foo','foo','bar','foo','bar','baz'], index = range(6)),
'Value': pd.Series(list('abcdef'),index = range(6))}
df2 = pd.DataFrame(d2)
restacked = df.set_index(['ID','Count','Value_type']).unstack()
print restacked
restacked2 = df2.set_index(['ID','Count','Value_type']).unstack()
print restacked2
pivoted = pd.pivot_table(df,rows = ['ID','Count'],cols = 'Value_type',values = 'Value')
print pivoted
## raises DataError('No numeric types to aggregate'),
## even though no aggregation function is passed.
pivoted2 = pd.pivot_table(df2,rows = ['ID','Count'],cols = 'Value_type',values = 'Value')
print pivoted2
The default agg function is np.mean (even though you didn't pass it explicitly this is what is being used), which doesn't make sense on strings, in fact it raises an AttributeError when passed an object array - so pandas complains when you try to do this.
You could pass np.sum:
In [11]: pd.pivot_table(df2, rows=['ID', 'Count'], cols='Value_type',
values='Value', aggfunc=np.sum)
Out[11]:
Value_type bar baz foo
ID Count
x 1 c NaN a
2 NaN NaN b
y 1 e f d
Or take the first item using iloc[0]:
In [12]: pd.pivot_table(df2, rows=['ID', 'Count'], cols='Value_type',
values='Value', aggfunc=lambda x: x.iloc[0])
Out[12]:
Value_type bar baz foo
ID Count
x 1 c NaN a
2 NaN NaN b
y 1 e f d
Note: that this is the same as pivoted2['Value'], you can make this output the same as pivoted2 if you pass a list to values to aggregate:
In [13]: pd.pivot_table(df2, rows=['ID', 'Count'], cols=['Value_type'],
values=['Value'], aggfunc=lambda x: x.iloc[0])
Out[13]:
Value
Value_type bar baz foo
ID Count
x 1 c NaN a
2 NaN NaN b
y 1 e f d
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With