Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

drop unused categories using groupby on categorical variable in pandas

Tags:

python

pandas

As per Categorical Data - Operations, by default groupby will show “unused” categories:

In [118]: cats = pd.Categorical(["a","b","b","b","c","c","c"], categories=["a","b","c","d"])

In [119]: df = pd.DataFrame({"cats":cats,"values":[1,2,2,2,3,4,5]})

In [120]: df.groupby("cats").mean()
Out[120]: 
      values
cats        
a        1.0
b        2.0
c        4.0
d        NaN

How to obtain the result with the “unused” categories dropped? e.g.

  values
cats        
a        1.0
b        2.0
c        4.0
like image 747
tales Avatar asked Jan 02 '18 17:01

tales


People also ask

How do I drop a category in pandas?

To remove the specified categories from CategoricalIndex, use the remove_categories() method in Pandas.

How do you group categorical variables in pandas?

This is done using the groupby() method given in pandas. It returns all the combinations of groupby columns. Along with group by we have to pass an aggregate function with it to ensure that on what basis we are going to group our variables. Some aggregate function are mean(), sum(), count() etc.

Does Groupby preserve order?

Groupby preserves the order of rows within each group.


1 Answers

Since version 0.23 you can specify observed=True in the groupby call to achieve the desired behavior.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html

like image 179
Dienow Avatar answered Sep 24 '22 15:09

Dienow