Given this DataFrame:
df = pd.DataFrame([['August', 2], ['July', 3], ['Sept', 6]], columns=['A', 'B'])
I would like to sort column A in this order: July, August, Sept. Is there some way to use a sort function like "sort_values" but pre-define the sort order by values?
Using Categorical
df.A=pd.Categorical(df.A,categories=['July', 'August', 'Sept'])
df=df.sort_values('A')
df
Out[310]:
A B
1 July 3
0 August 2
2 Sept 6
Define the order in a dictionary and sort according to it
sort_dict = {'July':0,'August':1,'Sept':2}
df.iloc[df['A'].map(sort_dict).sort_values().index]
Output
A B
1 July 3
0 August 2
2 Sept 6
You can assign your own values for sorting the column by, sort by those, then drop them:
df = pd.DataFrame([['August', 2], ['July', 3], ['Sept', 6]], columns=['A', 'B'])
value_map = {'August': 1, 'July': 0, 'Sept': 2}
def sort_by_key(df, col, value_map):
df = df.assign(sort = lambda df: df[col].map(value_map))
return df.sort_values('sort') \
.drop('sort', axis='columns')
sort_by_key(df, 'A', value_map)
Results in:
A B
1 July 3
0 August 2
2 Sept 6
Are you opposed to using either complete month names or consistent abbreviations?
df = pd.DataFrame([['August', 2], ['July', 3], ['Sept', 6]], columns=['A', 'B'])
df
import calendar
df = df.replace({'Sept':'September'})
calendar.month_name[1:]
Output:
['January',
'February',
'March',
'April',
'May',
'June',
'July',
'August',
'September',
'October',
'November',
'December']
df['A'] = pd.Categorical(df.A, categories=calendar.month_name[1:], ordered=True)
df.sort_values('A')
Output:
A B
1 July 3
0 August 2
2 September 6
Or use calendar.month_abbr
calendar.month_abbr[1:]
Output:
['Jan',
'Feb',
'Mar',
'Apr',
'May',
'Jun',
'Jul',
'Aug',
'Sep',
'Oct',
'Nov',
'Dec']
Temporarily convert the str month to datetime and sort
df = pd.DataFrame([['August', 2], ['July', 3], ['Sept', 6]], columns=['A', 'B'])
df['tmp'] = pd.to_datetime(df['A'].str[:3], format='%b').dt.month
df.sort_values(by = ['tmp']).drop('tmp', 1)
A B
1 July 3
0 August 2
2 Sept 6
I changed your 'Sept' to 'September' to keep it consistent with the other months' naming convention.
Then I made an ordered list of month names with pd.date_range.
Subdivided the list by the values you had (keeps the correct month-order).
Made a categorical using that sublist, and then sorted on those values
import pandas as pd
df = pd.DataFrame([['August', 2], ['July', 3], ['September', 6]], columns=['A', 'B'])
full_month_list = pd.date_range('2018-01-01','2019-01-01', freq='MS').strftime("%B").tolist()
partial_month_list = [x for x in month_list if x in df['A'].values]
df['A'] = pd.Categorical(df['A'], partial_month_list)
df.sort_values('A')
Results in:
A B
1 July 3
0 August 2
2 September 6
since pandas version 1.1.0, sort_values support sort by key.
df = df.sort_values('A', key=lambda s: s.apply(['July', 'August', 'Sept'].index), ignore_index=True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With