As per Categorical Data - Operations, by default groupby
will show “unused” categories:
In [118]: cats = pd.Categorical(["a","b","b","b","c","c","c"], categories=["a","b","c","d"])
In [119]: df = pd.DataFrame({"cats":cats,"values":[1,2,2,2,3,4,5]})
In [120]: df.groupby("cats").mean()
Out[120]:
values
cats
a 1.0
b 2.0
c 4.0
d NaN
How to obtain the result with the “unused” categories dropped? e.g.
values
cats
a 1.0
b 2.0
c 4.0
To remove the specified categories from CategoricalIndex, use the remove_categories() method in Pandas.
This is done using the groupby() method given in pandas. It returns all the combinations of groupby columns. Along with group by we have to pass an aggregate function with it to ensure that on what basis we are going to group our variables. Some aggregate function are mean(), sum(), count() etc.
Groupby preserves the order of rows within each group.
Since version 0.23 you can specify observed=True
in the groupby
call to achieve the desired behavior.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With