I'm Trying to use Pivot_table on Dask with the following dataframe:
date store_nbr item_nbr unit_sales year month
0 2013-01-01 25 103665 7.0 2013 1
1 2013-01-01 25 105574 1.0 2013 1
2 2013-01-01 25 105575 2.0 2013 1
3 2013-01-01 25 108079 1.0 2013 1
4 2013-01-01 25 108701 1.0 2013 1
When I try to pivot_table as follows:
ddf.pivot_table(values='unit_sales', index={'store_nbr','item_nbr'},
columns={'year','month'}, aggfunc={'mean','sum'})
I got this error:
ValueError: 'index' must be the name of an existing column
And If I just use only one value on index and columns parameters as follows:
df.pivot_table(values='unit_sales', index='store_nbr',
columns='year', aggfunc={'sum'})
I got this error:
ValueError: 'columns' must be category dtype
That error is telling you that dask dataframe expects the column used in the columns keyword to be a categorical dtype. It needs this so that it can define the columns correctly, even during lazy operation. You can accomplish this as follows:
df = df.categorize(columns=['year'])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With