I have a Series object that has:
date price
dec 12
may 15
apr 13
..
Problem statement: I want to make it appear by month and compute the mean price for each month and present it with a sorted manner by month.
Desired Output:
month mean_price
Jan XXX
Feb XXX
Mar XXX
I thought of making a list and passing it in a sort function:
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
but the sort_values doesn't support that for series.
One big problem I have is that even though
df = df.sort_values(by='date',ascending=True,inplace=True)
works
to the initial df
but after I did a groupby
, it didn't maintain the order coming out from the sorted df
.
To conclude, I needed from the initial data frame these two columns. Sorted the datetime column and through a groupby using the month (dt.strftime('%B')) the sorting got messed up. Now I have to sort it by month name.
My code:
df # has 5 columns though I need the column 'date' and 'price'
df.sort_values(by='date',inplace=True) #at this part it is sorted according to date, great
total=(df.groupby(df['date'].dt.strftime('%B'))['price'].mean()) # Though now it is not as it was but instead the months appear alphabetically
Sort values by month using sort_values() and creating a month dictionary. So you can see the result is a sorted dataframe rows by month name.
strftime() strftime() method takes datetime format and returns a string representing the specific format. You can use %Y and %m as format codes to extract year and month respectively from the pandas DataFrame.
Sort the Series in Ascending Order By default, the pandas series sort_values() function sorts the series in ascending order. You can also use ascending=True param to explicitly specify to sort in ascending order. Also, if you have any NaN values in the Series, it sort by placing all NaN values at the end.
You can use categorical data to enable proper sorting with pd.Categorical
:
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
df['months'] = pd.Categorical(df['months'], categories=months, ordered=True)
df.sort_values(...) # same as you have now; can use inplace=True
When you specify the categories, pandas remembers the order of specification as the default sort order.
Docs: Pandas categories > sorting & order.
You should consider re-indexing it based on axis 0 (indexes)
new_order = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
df1 = df.reindex(new_order, axis=0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With