Dask: nunique method on Dataframe groupBy

Question

I would like to know if it is possible to have the number of unique items from a given column after a groupBy aggregation with Dask. I don't see anything like this in the documentation. It is available on pandas dataframe and really useful. I've seen some issue related to this, but I am not sure it is implemented.

Can someone give me some hints about this?

Sahil Dahiya · Accepted Answer

To implement nunique in dask groupby you have to use an aggregate function.

import pandas as pd
import dask.dataframe as dd

def chunk(s):
    '''
    The function applied to the
    individual partition (map)
    '''    
    return s.apply(lambda x: list(set(x)))


def agg(s):
    '''
    The function whic will aggrgate 
    the result from all the partitions(reduce)
    '''
    s = s._selected_obj    
    return s.groupby(level=list(range(s.index.nlevels))).sum()


def finalize(s):
    '''
    The optional functional that will be 
    applied to the result of the agg_tu functions
    '''
    return s.apply(lambda x: len(set(x)))


tunique = dd.Aggregation('tunique', chunk, agg,finalize)

df = pd.DataFrame({
'col': [0, 0, 1, 1, 2, 3, 3] * 10,
'g0': ['a', 'a', 'b', 'a', 'b', 'b', 'a'] * 10,
 })

 ddf = dd.from_pandas(df, npartitions=10)

 res = ddf.groupby(['col']).agg({'g0': tunique}).compute()
 print(res)

jsignell · Answer

To expand on this comment you can use nunique on a SeriesGroupBy directly:

import pandas as pd
import dask.dataframe as dd

d = {'col1': [1, 2, 3, 4], 'col2': [5, 6, 7, 8]}
df = pd.DataFrame(data=d)
ddf = dd.from_pandas(df, npartitions=2)
ddf.groupby(['col1']).col2.nunique().to_frame().compute()

See https://github.com/dask/dask/issues/6280 for more discussion.

Dask: nunique method on Dataframe groupBy

Tags:

python

dask

dask-distributed

Guillaume EB

2 Answers

Sahil Dahiya

jsignell

Recent Activity

Donate For Us

Dask: nunique method on Dataframe groupBy

Tags:

python

dask

dask-distributed

Guillaume EB

2 Answers

Sahil Dahiya

jsignell

Related questions

Recent Activity

Donate For Us