Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting a dataframe column using a list with repeated values

Tags:

python

pandas

Given a dataframe such as this:

df = pd.DataFrame({'Drink': ['Beer', 'Beer', 'Wine', 'Wine', 'Wine', 'Whisky', 'Whisky'], 'Units': [14, 5, 9, 15, 7, 12, 17]})
Drink Units
Beer 14
Beer 5
Wine 9
Wine 15
Wine 7
Whisky 12
Whisky 17

How can I sort the Drink column using a list like this one?

order = ['Wine', 'Beer', 'Whisky', 'Beer', 'Wine', 'Whisky']

So that the resulting dataframe looks like this:

Drink Units
Wine 9
Beer 14
Whisky 12
Beer 5
Wine 15
Whisky 17

The initial dataframe has more rows than elements in the list, so once everything in the list has matched to a row, the remaining rows can be dropped.

like image 289
85sph Avatar asked Sep 01 '25 04:09

85sph


1 Answers

On second thought, I think you really want this:

import pandas as pd
import numpy as np

df = pd.DataFrame({'Drink': ['Beer', 'Beer', 'Wine', 'Wine', 'Wine', 'Whisky', 'Whisky'], 'Units': [14, 5, 9, 15, 7, 12, 17]})

order = ['Wine', 'Beer', 'Whisky', 'Beer', 'Wine', 'Whisky']
orders = pd.Series(order)

idx = orders.to_frame().assign(sortkey=orders.groupby(orders).cumcount())

df_out = df.assign(sortkey=df.groupby('Drink', sort=False).cumcount())\
           .set_index(['Drink','sortkey'])\
           .reindex(idx).reset_index()\
           .drop('sortkey', axis=1)
print(df_out)

Output:

    Drink  Units
0    Wine      9
1    Beer     14
2  Whisky     12
3    Beer      5
4    Wine     15
5  Whisky     17

IIUC, you want to sort based on Drink,

import pandas as pd
import numpy as np

df = pd.DataFrame({'Drink': ['Beer', 'Beer', 'Wine', 'Wine', 'Wine', 'Whisky', 'Whisky'], 'Units': [14, 5, 9, 15, 7, 12, 17]})

order = ['Wine', 'Beer', 'Whisky', 'Beer', 'Wine', 'Whisky']

orderDtype = pd.CategoricalDtype(categories = ['Wine', 'Beer', 'Whisky'], ordered = True)

df_out = df.assign(sortkey=df.groupby('Drink', sort=False).cumcount(), Drink=df['Drink'].astype(orderDtype))\
           .sort_values(['sortkey', 'Drink'])\
           .drop('sortkey', axis=1)

print(df_out)

Output:

    Drink  Units
2    Wine      9
0    Beer     14
5  Whisky     12
3    Wine     15
1    Beer      5
6  Whisky     17
4    Wine      7
like image 193
Scott Boston Avatar answered Sep 02 '25 18:09

Scott Boston