drop unused categories using groupby on categorical variable in pandas

Tags:

python

pandas

As per Categorical Data - Operations, by default groupby will show “unused” categories:

In [118]: cats = pd.Categorical(["a","b","b","b","c","c","c"], categories=["a","b","c","d"])

In [119]: df = pd.DataFrame({"cats":cats,"values":[1,2,2,2,3,4,5]})

In [120]: df.groupby("cats").mean()
Out[120]: 
      values
cats        
a        1.0
b        2.0
c        4.0
d        NaN

How to obtain the result with the “unused” categories dropped? e.g.

  values
cats        
a        1.0
b        2.0
c        4.0

747

asked Jan 02 '18 17:01

tales

1 Answers

Since version 0.23 you can specify observed=True in the groupby call to achieve the desired behavior.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html

179

answered Sep 24 '22 15:09

Dienow

Related questions
                            
                                Pandas: Cumulative return function
                            
                                Argument 1 has unexpected type 'NoneType'?
                            
                                run django in docker container
                            
                                numba - guvectorize barely faster than jit
                            
                                Outline plot area in plotly in Python
                            
                                HTTP status code to status message
                            
                                Clear widget area of a cell in a Jupyter notebook from within notebook
                            
                                Historical ethereum prices - Coinbase API
                            
                                Remove double space and replace with a single one in pandas
                            
                                Use temp table with SQLAlchemy
                            
                                Combine 2 pandas dataframes according to boolean Vector
                            
                                NLP reverse tokenizing (going from tokens to nicely formatted sentence)
                            
                                Pandas groupby custom function to each series
                            
                                Arrow properties in matplotlib annotate
                            
                                error using plotly on pycharm
                            
                                How can I compute the absolute sum with a groupby in pandas?
                            
                                How to make sklearn.metrics.confusion_matrix() to always return TP, TN, FP, FN?
                            
                                Rotated image coordinates after scipy.ndimage.interpolation.rotate?
                            
                                How can I print the Learning Rate at each epoch with Adam optimizer in Keras?
                            
                                Tensorflow LinearRegressor Feature Cannot have rank 0

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With