Weird behaviour with groupby on ordered categorical columns

Tags:

MCVE

df = pd.DataFrame({
    'Cat': ['SF', 'W', 'F', 'R64', 'SF', 'F'], 
    'ID': [1, 1, 1, 2, 2, 2]
})

df.Cat = pd.Categorical(
    df.Cat, categories=['R64', 'SF', 'F', 'W'], ordered=True)

As you can see, I've define an ordered categorical column on Cat. To verify, check;

0     SF
1      W
2      F
3    R64
4     SF
5      F
Name: Cat, dtype: category
Categories (4, object): [R64 < SF < F < W]

I want to find the largest category PER ID. Doing groupby + max works.

df.groupby('ID').Cat.max()

ID
1    W
2    F
Name: Cat, dtype: object

But I don't want ID to be the index, so I specify as_index=False.

df.groupby('ID', as_index=False).Cat.max()

   ID Cat
0   1   W
1   2  SF

Oops! Now, the max is taken lexicographically. Can anyone explain whether this is intended behaviour? Or is this a bug?

Note, for this problem, the workaround is df.groupby('ID').Cat.max().reset_index().

Note,

>>> pd.__version__
'0.22.0'

579

asked Jun 09 '18 21:06

cs95

1 Answers

This is not intended behavior, it's a bug.

Source diving shows the flag does two completely different things. The one simply ignores grouper levels and names, it just takes the values with a new range index. The other one clearly keeps them.

answered Oct 12 '22 03:10

firelynx

Related questions
                            
                                Flask not releasing memory
                            
                                Plotting with scientific axis, changing the number of significant figures
                            
                                Algorithm to exchange the roles of two randomly chosen nodes from a tree moving pointers
                            
                                TensorFlow: Performing this loss computation
                            
                                Django Proxy Field
                            
                                Can we use serializer_class attribute with APIView(django rest framework)?
                            
                                How to save plots from multiple python scripts using an interactive C# process command?
                            
                                Python scraping of javascript web pages fails for https pages only
                            
                                Providing SSL Connections in Python using PKCS#11
                            
                                Efficient way to set elements to zero where mask is True on scipy sparse matrix
                            
                                Pandas uses substantially more memory for storage than asked for
                            
                                Debugging a Neural Network
                            
                                Numpy Apply Along Axis and Get Row Index
                            
                                (Installing Python 3.6.1) SSLError: SSL: TLSV1_ALERT_UNKNOWN_CA tlsv1 alert unknown ca
                            
                                Text[Multi-Level] Classification with many outputs
                            
                                Temporary images with Pyglet
                            
                                How to use the latest sqlite3 version in python
                            
                                Proxy Pooling System for Scrapy to temporarily stop using slow/timing out proxies
                            
                                How to use py_func with a function that returns dict
                            
                                What does "Broker transport failure" mean in kafka?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Weird behaviour with groupby on ordered categorical columns

Tags:

python

pandas

group-by

pandas-groupby

categorical-data

cs95

People also ask

1 Answers

firelynx

Recent Activity

Donate For Us