I have a dataframe in Pandas, I would like to sort its columns (i.e. get a new dataframe, or a view) according to the mean value of its columns (or e.g. by their std value). The documentation talks about sorting by label or value, but I could not find anything on custom sorting methods.
How can I do this?
To sort the DataFrame based on the values in a single column, you'll use . sort_values() . By default, this will return a new DataFrame sorted in ascending order. It does not modify the original DataFrame.
You can sort pandas DataFrame by one or multiple (one or more) columns using sort_values() method and by ascending or descending order.
You can use the mean
DataFrame method and the Series sort_values
method:
In [11]: df = pd.DataFrame(np.random.randn(4,4), columns=list('ABCD')) In [12]: df Out[12]: A B C D 0 0.933069 1.432486 0.288637 -1.867853 1 -0.455952 -0.725268 0.339908 1.318175 2 -0.894331 0.573868 1.116137 0.508845 3 0.661572 0.819360 -0.527327 -0.925478 In [13]: df.mean() Out[13]: A 0.061089 B 0.525112 C 0.304339 D -0.241578 dtype: float64 In [14]: df.mean().sort_values() Out[14]: D -0.241578 A 0.061089 C 0.304339 B 0.525112 dtype: float64
Then you can reorder the columns using reindex
:
In [15]: df.reindex(df.mean().sort_values().index, axis=1) Out[15]: D A C B 0 -1.867853 0.933069 0.288637 1.432486 1 1.318175 -0.455952 0.339908 -0.725268 2 0.508845 -0.894331 1.116137 0.573868 3 -0.925478 0.661572 -0.527327 0.819360
Note: In earlier versions of pandas, sort_values
used to be order
, but order
was deprecated as part of 0.17 so to be more consistent with the other sorting methods. Also, in earlier versions, one had to use reindex_axis
rather than reindex
.
You can use assign to create a variable, use it to sort values and drop it in the same line of code.
df = pd.DataFrame(np.random.randn(4,4), columns=list('ABCD')) df.assign(m=df.mean(axis=1)).sort_values('m').drop('m', axis=1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With