Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sort pandas dataframe by two date columns

I have a pandas dataframe like this:

   column_year  column_Month  a_integer_column
0   2014        April         25.326531
1   2014        August        25.544554
2   2015        December      25.678261
3   2014        February      24.801187
4   2014        July          24.990338
...  ...           ...           ...
68  2018        November      26.024931
69  2017        October       25.677333
70  2019        September     24.432361
71  2020        February      25.383648
72  2020        January       25.504831

I now want to sort year column first and then month column, like this below:

   column_year  column_Month  a_integer_column
3   2014        February      24.801187
0   2014        April         25.326531
4   2014        July          24.990338
1   2014        August        25.544554
2   2015        December      25.678261
...  ...           ...            ...
69  2017        October       25.677333
68  2018        November      26.024931
70  2019        September     24.432361
72  2020        January       25.504831
71  2020        February      25.383648

How do i do this?

like image 784
Naveen Reddy Marthala Avatar asked Apr 22 '20 13:04

Naveen Reddy Marthala


2 Answers

Let us try to_datetime + argsort:

df=df.iloc[pd.to_datetime(df.column_year.astype(str)+df.column_Month,format='%Y%B').argsort()]
   column_year column_Month  a_integer_column
3         2014     February         24.801187
0         2014        April         25.326531
4         2014         July         24.990338
1         2014       August         25.544554
2         2015     December         25.678261
like image 89
BENY Avatar answered Oct 03 '22 07:10

BENY


You can change the column_Month column into a CategoricalDtype

Months = pd.CategoricalDtype([
    'January', 'February', 'March', 'April', 'May', 'June',
    'July', 'August', 'September', 'October', 'November', 'December'
], ordered=True)

df.astype({'column_Month': Months}).sort_values(['column_year', 'column_Month'])

    column_year column_Month  a_integer_column
3          2014     February         24.801187
0          2014        April         25.326531
4          2014         July         24.990338
1          2014       August         25.544554
2          2015     December         25.678261
69         2017      October         25.677333
68         2018     November         26.024931
70         2019    September         24.432361
72         2020      January         25.504831
71         2020     February         25.383648
like image 31
piRSquared Avatar answered Oct 03 '22 07:10

piRSquared