Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sort a boxplot by the median values in pandas

I've got a dataframe outcome2 that I generate a grouped boxplot with in the following manner:

In [11]: outcome2.boxplot(column='Hospital 30-Day Death (Mortality) Rates from Heart Attack',by='State')
        plt.ylabel('30 Day Death Rate')
        plt.title('30 Day Death Rate by State')
Out [11]:

enter image description here

What I'd like to do is sort the plot by the median for each state, instead of alphabetically. Not sure how to go about doing so.

like image 854
Chris Avatar asked Oct 19 '13 18:10

Chris


People also ask

How do you sort a Boxplot in Python?

The key step to get boxplots in descending order is to get the sorted index in descending order. We can do that by changing the default ascending=True to asending=False while using Pandas sort_values() to sort the mean or median values.

How do you sort a DataFrame based on values?

You can sort by column values in pandas DataFrame using sort_values() method. To specify the order, you have to use ascending boolean property; False for descending and True for ascending. By default, it is set to True.

How do you order a Boxplot in Seaborn?

Seaborn's boxplot() function easily allows us to choose the order of boxplots using the argument “order”. The argument order takes a list ordered in the way we want. Here we manually specify the order of boxes using order as order=[“Professional”,”Less than bachelor's”,”Bachelor's”,”Master's”, 'PhD'].

How do you get median in describe in Pandas?

If you want to see the median, you can use df. describe(). The 50% value is the median.


1 Answers

To sort by the median, just compute the median, then sort it and use the resulting Index to slice the DataFrame:

In [45]: df.iloc[:10, :5]
Out[45]:
      AK     AL     AR     AZ     CA
0  0.047  0.199  0.969 -0.205  1.053
1  0.206  0.132 -0.712  0.111 -0.254
2  0.638  0.233 -0.907  1.284  1.193
3  1.234  0.046  0.624  0.485 -0.048
4 -1.362 -0.559  1.108 -0.501  0.111
5  1.276 -0.954  0.653 -0.175 -0.287
6  0.524 -1.785 -0.887  1.354 -0.431
7  0.111  0.762 -0.514  0.808  0.728
8  1.301  0.619  0.957  1.542 -0.087
9 -0.892  2.327  1.363 -1.537  0.142

In [46]: med = df.median()

In [47]: med.sort()

In [48]: newdf = df[med.index]

In [49]: newdf.iloc[:10, :5]
Out[49]:
      PA     CT     LA     RI     MO
0 -0.667  0.774 -0.999 -0.938  0.155
1  0.822  0.390 -0.014 -2.228  0.570
2 -1.037  0.838 -0.673  2.038  0.809
3  0.620  2.845 -0.523 -0.151 -0.955
4 -0.918  1.043  0.613  0.698 -0.446
5 -0.767  0.869 -0.496 -0.925 -0.374
6 -0.495  0.437  1.245 -1.046  0.894
7 -1.283  0.358  0.016  0.137  0.511
8 -0.018 -0.047 -0.639 -0.385  0.080
9 -1.705  0.986  0.605  0.295  0.302

In [50]: med.head()
Out[50]:
PA   -0.117
CT   -0.077
LA   -0.072
RI   -0.069
MO   -0.053
dtype: float64

The resulting figure:

enter image description here

like image 105
Phillip Cloud Avatar answered Oct 19 '22 23:10

Phillip Cloud