How to use Dask Pivot_table?

Question

I'm Trying to use Pivot_table on Dask with the following dataframe:

    date    store_nbr   item_nbr    unit_sales  year    month
0   2013-01-01  25       103665      7.0        2013      1
1   2013-01-01  25       105574      1.0        2013      1
2   2013-01-01  25       105575      2.0        2013      1
3   2013-01-01  25       108079      1.0        2013      1
4   2013-01-01  25       108701      1.0        2013      1

When I try to pivot_table as follows:

ddf.pivot_table(values='unit_sales', index={'store_nbr','item_nbr'}, 
                                  columns={'year','month'}, aggfunc={'mean','sum'})

I got this error:

ValueError: 'index' must be the name of an existing column

And If I just use only one value on index and columns parameters as follows:

df.pivot_table(values='unit_sales', index='store_nbr', 
                                  columns='year', aggfunc={'sum'})

I got this error:

ValueError: 'columns' must be category dtype

MRocklin · Accepted Answer

That error is telling you that dask dataframe expects the column used in the columns keyword to be a categorical dtype. It needs this so that it can define the columns correctly, even during lazy operation. You can accomplish this as follows:

df = df.categorize(columns=['year'])

How to use Dask Pivot_table?

Tags:

dataframe

pivot-table

dask

ambigus9

1 Answers

MRocklin

Recent Activity

Donate For Us

How to use Dask Pivot_table?

Tags:

dataframe

pivot-table

dask

ambigus9

1 Answers

MRocklin

Related questions

Recent Activity

Donate For Us