Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Sorting columns by their mean value

Tags:

python

pandas

I have a dataframe in Pandas, I would like to sort its columns (i.e. get a new dataframe, or a view) according to the mean value of its columns (or e.g. by their std value). The documentation talks about sorting by label or value, but I could not find anything on custom sorting methods.

How can I do this?

like image 489
Amelio Vazquez-Reina Avatar asked Jul 17 '13 23:07

Amelio Vazquez-Reina


People also ask

How do I sort pandas DataFrame based on column value?

To sort the DataFrame based on the values in a single column, you'll use . sort_values() . By default, this will return a new DataFrame sorted in ascending order. It does not modify the original DataFrame.

Can you sort a DataFrame with respect to multiple columns?

You can sort pandas DataFrame by one or multiple (one or more) columns using sort_values() method and by ascending or descending order.


2 Answers

You can use the mean DataFrame method and the Series sort_values method:

In [11]: df = pd.DataFrame(np.random.randn(4,4), columns=list('ABCD'))  In [12]: df Out[12]:           A         B         C         D 0  0.933069  1.432486  0.288637 -1.867853 1 -0.455952 -0.725268  0.339908  1.318175 2 -0.894331  0.573868  1.116137  0.508845 3  0.661572  0.819360 -0.527327 -0.925478  In [13]: df.mean() Out[13]: A    0.061089 B    0.525112 C    0.304339 D   -0.241578 dtype: float64  In [14]: df.mean().sort_values() Out[14]: D   -0.241578 A    0.061089 C    0.304339 B    0.525112 dtype: float64 

Then you can reorder the columns using reindex:

In [15]: df.reindex(df.mean().sort_values().index, axis=1) Out[15]:           D         A         C         B 0 -1.867853  0.933069  0.288637  1.432486 1  1.318175 -0.455952  0.339908 -0.725268 2  0.508845 -0.894331  1.116137  0.573868 3 -0.925478  0.661572 -0.527327  0.819360 

Note: In earlier versions of pandas, sort_values used to be order, but order was deprecated as part of 0.17 so to be more consistent with the other sorting methods. Also, in earlier versions, one had to use reindex_axis rather than reindex.

like image 87
Andy Hayden Avatar answered Oct 10 '22 11:10

Andy Hayden


You can use assign to create a variable, use it to sort values and drop it in the same line of code.

df = pd.DataFrame(np.random.randn(4,4), columns=list('ABCD')) df.assign(m=df.mean(axis=1)).sort_values('m').drop('m', axis=1) 
like image 38
Adriel M. Vieira Avatar answered Oct 10 '22 10:10

Adriel M. Vieira