Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does a sorted dataframe keep its order after groupby? [duplicate]

I would like to keep the latest entry per group in a dataframe:

from datetime import date
import pandas as pd    
data = [
    ['A', date(2018,2,1), "I want this"],
    ['A', date(2018,1,1), "Don't want"],
    ['B', date(2019,4,1), "Don't want"],
    ['B', date(2019,5,1), "I want this"]]

df = pd.DataFrame(data, columns=['name', 'date', 'result'])

The following does what I want (found and credits here):

df.sort_values('date').groupby('name').tail(1)
    name    date    result
0   A   2018-02-01  I want this
3   B   2019-05-01  I want this

But how do I know the order is always preserved when you do a groupby on a sorted data frame like df? Is it somewhere documented?

like image 570
Michael Dorner Avatar asked Mar 02 '26 12:03

Michael Dorner


1 Answers

No it won't. Try to replace A with Z to see it.

Use sort=False:

df.sort_values('date').groupby('name', sort=False).tail(1)
like image 140
mozway Avatar answered Mar 05 '26 02:03

mozway