Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sort column in Pandas DataFrame by specific order

Tags:

python

pandas

Given this DataFrame:

df = pd.DataFrame([['August', 2], ['July', 3], ['Sept', 6]], columns=['A', 'B'])

I would like to sort column A in this order: July, August, Sept. Is there some way to use a sort function like "sort_values" but pre-define the sort order by values?

like image 270
sparrow Avatar asked Oct 12 '18 17:10

sparrow


7 Answers

Using Categorical

df.A=pd.Categorical(df.A,categories=['July', 'August', 'Sept'])
df=df.sort_values('A')
df
Out[310]: 
        A  B
1    July  3
0  August  2
2    Sept  6
like image 115
BENY Avatar answered Oct 20 '22 18:10

BENY


Define the order in a dictionary and sort according to it

sort_dict = {'July':0,'August':1,'Sept':2}
df.iloc[df['A'].map(sort_dict).sort_values().index]

Output

       A    B
1   July    3
0   August  2
2   Sept    6
like image 28
Yuca Avatar answered Oct 20 '22 17:10

Yuca


You can assign your own values for sorting the column by, sort by those, then drop them:

df = pd.DataFrame([['August', 2], ['July', 3], ['Sept', 6]], columns=['A', 'B'])
value_map = {'August': 1, 'July': 0, 'Sept': 2}
def sort_by_key(df, col, value_map):
    df = df.assign(sort = lambda df: df[col].map(value_map))
    return df.sort_values('sort') \
             .drop('sort', axis='columns')

sort_by_key(df, 'A', value_map)

Results in:

        A  B
1    July  3
0  August  2
2    Sept  6
like image 1
PMende Avatar answered Oct 20 '22 18:10

PMende


Are you opposed to using either complete month names or consistent abbreviations?

df = pd.DataFrame([['August', 2], ['July', 3], ['Sept', 6]], columns=['A', 'B'])

df

import calendar

df = df.replace({'Sept':'September'})

calendar.month_name[1:]

Output:

['January',
 'February',
 'March',
 'April',
 'May',
 'June',
 'July',
 'August',
 'September',
 'October',
 'November',
 'December']

df['A'] = pd.Categorical(df.A, categories=calendar.month_name[1:], ordered=True)

df.sort_values('A')

Output:

           A  B
1       July  3
0     August  2
2  September  6

Or use calendar.month_abbr

calendar.month_abbr[1:]

Output:

['Jan',
 'Feb',
 'Mar',
 'Apr',
 'May',
 'Jun',
 'Jul',
 'Aug',
 'Sep',
 'Oct',
 'Nov',
 'Dec']
like image 4
Scott Boston Avatar answered Oct 20 '22 17:10

Scott Boston


Temporarily convert the str month to datetime and sort

df = pd.DataFrame([['August', 2], ['July', 3], ['Sept', 6]], columns=['A', 'B'])
df['tmp'] = pd.to_datetime(df['A'].str[:3], format='%b').dt.month
df.sort_values(by = ['tmp']).drop('tmp', 1)


    A       B
1   July    3
0   August  2
2   Sept    6
like image 1
Vaishali Avatar answered Oct 20 '22 17:10

Vaishali


I changed your 'Sept' to 'September' to keep it consistent with the other months' naming convention.

Then I made an ordered list of month names with pd.date_range.

Subdivided the list by the values you had (keeps the correct month-order).

Made a categorical using that sublist, and then sorted on those values

import pandas as pd


df = pd.DataFrame([['August', 2], ['July', 3], ['September', 6]], columns=['A', 'B'])

full_month_list = pd.date_range('2018-01-01','2019-01-01', freq='MS').strftime("%B").tolist()
partial_month_list = [x for x in month_list if x in df['A'].values]
df['A'] = pd.Categorical(df['A'], partial_month_list)

df.sort_values('A')

Results in:

    A           B
1   July        3
0   August      2
2   September   6
like image 1
Declan Avatar answered Oct 20 '22 17:10

Declan


since pandas version 1.1.0, sort_values support sort by key.

df = df.sort_values('A', key=lambda s: s.apply(['July', 'August', 'Sept'].index), ignore_index=True)
like image 1
Charles Liu Avatar answered Oct 20 '22 17:10

Charles Liu