Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sort a pandas dataframe series by month name

Tags:

I have a Series object that has:

    date   price
    dec      12
    may      15
    apr      13
    ..

Problem statement: I want to make it appear by month and compute the mean price for each month and present it with a sorted manner by month.

Desired Output:

 month mean_price
  Jan    XXX
  Feb    XXX
  Mar    XXX

I thought of making a list and passing it in a sort function:

months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]

but the sort_values doesn't support that for series.

One big problem I have is that even though

df = df.sort_values(by='date',ascending=True,inplace=True) works to the initial df but after I did a groupby, it didn't maintain the order coming out from the sorted df.

To conclude, I needed from the initial data frame these two columns. Sorted the datetime column and through a groupby using the month (dt.strftime('%B')) the sorting got messed up. Now I have to sort it by month name.


My code:

df # has 5 columns though I need the column 'date' and 'price'

df.sort_values(by='date',inplace=True) #at this part it is sorted according to date, great
total=(df.groupby(df['date'].dt.strftime('%B'))['price'].mean()) # Though now it is not as it was but instead the months appear alphabetically
like image 879
J_p Avatar asked Dec 31 '17 13:12

J_p


People also ask

How do I sort by month name in pandas?

Sort values by month using sort_values() and creating a month dictionary. So you can see the result is a sorted dataframe rows by month name.

How do I extract the month in pandas?

strftime() strftime() method takes datetime format and returns a string representing the specific format. You can use %Y and %m as format codes to extract year and month respectively from the pandas DataFrame.

How do I sort pandas data series?

Sort the Series in Ascending Order By default, the pandas series sort_values() function sorts the series in ascending order. You can also use ascending=True param to explicitly specify to sort in ascending order. Also, if you have any NaN values in the Series, it sort by placing all NaN values at the end.


2 Answers

You can use categorical data to enable proper sorting with pd.Categorical:

months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", 
          "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
df['months'] = pd.Categorical(df['months'], categories=months, ordered=True)
df.sort_values(...)  # same as you have now; can use inplace=True

When you specify the categories, pandas remembers the order of specification as the default sort order.

Docs: Pandas categories > sorting & order.

like image 73
Brad Solomon Avatar answered Sep 22 '22 19:09

Brad Solomon


You should consider re-indexing it based on axis 0 (indexes)

new_order = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']

df1 = df.reindex(new_order, axis=0)
like image 32
Abhay Singh Avatar answered Sep 20 '22 19:09

Abhay Singh