Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dask Dataframe groupby has no len()

If you have a groupby object based on a dask dataframe why does len(<groupby object>) return an error? (bug or feature)

like image 446
Back2Basics Avatar asked Oct 16 '22 22:10

Back2Basics


1 Answers

This just hasn't been implemented. You might want to raise an issue (or better yet, a pull request). Pragmatically I would just call nunique on your grouping object

Before

g = df.groupby(df.x + df.y)
result = len(g)

After

result = (df.x + df.y).nunique()

Operationally this is nicer because it can be lazy (the result of len in Python must be a concrete integer) and because you can choose the nunique_approx variant, which will be far faster.

like image 63
MRocklin Avatar answered Oct 21 '22 00:10

MRocklin