Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Groupby preserve order among groups? In which way?

While answering a question Sort a pandas's dataframe series by month name? we meet some weird behavior of groupby.

df = pd.DataFrame([["dec", 12], ["jan", 40], ["mar", 11], ["aug", 21], ["aug", 11], ["jan", 11], ["jan", 1]], columns=["Month", "Price"])
df["Month_dig"] = pd.to_datetime(df.Month, format='%b', errors='coerce').dt.month
df.sort_values(by="Month_dig", inplace=True)

# Now df looks like
    Month   Price   Month_dig
1   jan     40      1
5   jan     11      1
6   jan     1       1
2   mar     11      3
3   aug     21      8
4   aug     11      8
0   dec     12      12

total = (df.groupby(df['Month'])['Price'].mean())
print(total)
# output
Month
aug    16.000000
dec    12.000000
jan    17.333333
mar    11.000000
Name: Price, dtype: float64

It seems that in total, the data is sorted alphabetically. While the OP and I were expecting

Month
jan    17.333333
mar    11.000000
aug    16.000000
dec    12.000000
Name: Price, dtype: float64

What's the mechanism behind groupby? I know that it preserves order within each group from the documentation but is there a rule for the order among groups? It seems to me a pretty straightforward group order would be ["jan", "mar", "aug", "dec"] as the data in df is sorted in this way.

p.s. From ["aug", "dec", "jan", "mar"], it seems these group names are sorted alphabetically.
I am using Python 3.6 and pandas '0.20.3'

like image 374
Tai Avatar asked Jan 28 '23 22:01

Tai


1 Answers

pandas.DataFrame.groupby has a sort argument that defaults to True. Try

total = (df.groupby(df['Month'], sort=False)['Price'].mean())
like image 118
Patrick Haugh Avatar answered Feb 15 '23 21:02

Patrick Haugh