Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Summarize categorical data in Dask DataFrame

Tags:

python

dask

By default describe method of Dask DataFrame summarizes only numerical columns. According to the docs I should be able to get descriptions of categorical columns by providing include parameter. However

df.describe(include=['category']).compute()

leads to a

TypeError: describe() got an unexpected keyword argument 'include'.

I tried also a little different approach:

df.select_dtypes(include=['category']).describe().compute()

and this time I get

ValueError: DataFrame contains only non-numeric data.

Could you please advise what would be the best way to summarize categorical columns in Dask DataFrame?

like image 349
grześ Avatar asked Jan 24 '18 13:01

grześ


1 Answers

Summarizing only numerical or object columns

  1. To call describe() on just the numerical columns use describe(include = [np.number])
  2. To call describe() on just the objects (strings) using describe(include = ['O']).

Quote: Pandas 'describe' is not returning summary of all columns

like image 186
Hugo Avatar answered Oct 03 '22 10:10

Hugo