I have a dataframe in Pandas, I would like to sort its columns (i.e. get a new dataframe, or a view) according to the mean value of its columns (or e.g. by their std value). The documentation talks about sorting by label or value, but I could not find anything on custom sorting methods.
How can I do this?
To sort the DataFrame based on the values in a single column, you'll use . sort_values() . By default, this will return a new DataFrame sorted in ascending order. It does not modify the original DataFrame.
You can sort pandas DataFrame by one or multiple (one or more) columns using sort_values() method and by ascending or descending order.
You can use the mean DataFrame method and the Series sort_values method:
In [11]: df = pd.DataFrame(np.random.randn(4,4), columns=list('ABCD')) In [12]: df Out[12]: A B C D 0 0.933069 1.432486 0.288637 -1.867853 1 -0.455952 -0.725268 0.339908 1.318175 2 -0.894331 0.573868 1.116137 0.508845 3 0.661572 0.819360 -0.527327 -0.925478 In [13]: df.mean() Out[13]: A 0.061089 B 0.525112 C 0.304339 D -0.241578 dtype: float64 In [14]: df.mean().sort_values() Out[14]: D -0.241578 A 0.061089 C 0.304339 B 0.525112 dtype: float64 Then you can reorder the columns using reindex:
In [15]: df.reindex(df.mean().sort_values().index, axis=1) Out[15]: D A C B 0 -1.867853 0.933069 0.288637 1.432486 1 1.318175 -0.455952 0.339908 -0.725268 2 0.508845 -0.894331 1.116137 0.573868 3 -0.925478 0.661572 -0.527327 0.819360 Note: In earlier versions of pandas, sort_values used to be order, but order was deprecated as part of 0.17 so to be more consistent with the other sorting methods. Also, in earlier versions, one had to use reindex_axis rather than reindex.
You can use assign to create a variable, use it to sort values and drop it in the same line of code.
df = pd.DataFrame(np.random.randn(4,4), columns=list('ABCD')) df.assign(m=df.mean(axis=1)).sort_values('m').drop('m', axis=1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With