Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I sort a boxplot in pandas by the median values?

Tags:

I want to draw a boxplot of column Z in dataframe df by the categories X and Y. How can I sort the boxplot by the median, in descending order?

import pandas as pd
import random
n = 100
# this is probably a strange way to generate random data; please feel free to correct it
df = pd.DataFrame({"X": [random.choice(["A","B","C"]) for i in range(n)], 
                   "Y": [random.choice(["a","b","c"]) for i in range(n)],
                   "Z": [random.gauss(0,1) for i in range(n)]})
df.boxplot(column="Z", by=["X", "Y"])

Note that this question is very similar, but they use a different data structure. I'm relatively new to pandas (and have only done some tutorials on python in general), so I couldn't figure out how to make my data work with the answer posted there. This may well be more of a reshaping than a plotting question. Maybe there is a solution using groupby?

like image 940
Fred S Avatar asked Feb 20 '14 15:02

Fred S


People also ask

How do you sort a DataFrame based on values?

You can sort by column values in pandas DataFrame using sort_values() method. To specify the order, you have to use ascending boolean property; False for descending and True for ascending. By default, it is set to True.

How do you get median in describe in pandas?

If you want to see the median, you can use df. describe(). The 50% value is the median.

How do I sort values in ascending order in pandas?

To sort the DataFrame based on the values in a single column, you'll use . sort_values() . By default, this will return a new DataFrame sorted in ascending order.


2 Answers

You can use the answer in How to sort a boxplot by the median values in pandas but first you need to group your data and create a new data frame:

import pandas as pd
import random
import matplotlib.pyplot as plt

n = 100
# this is probably a strange way to generate random data; please feel free to correct it
df = pd.DataFrame({"X": [random.choice(["A","B","C"]) for i in range(n)], 
                   "Y": [random.choice(["a","b","c"]) for i in range(n)],
                   "Z": [random.gauss(0,1) for i in range(n)]})
grouped = df.groupby(["X", "Y"])

df2 = pd.DataFrame({col:vals['Z'] for col,vals in grouped})

meds = df2.median()
meds.sort_values(ascending=False, inplace=True)
df2 = df2[meds.index]
df2.boxplot()

plt.show()

plot

like image 129
Alvaro Fuentes Avatar answered Sep 23 '22 17:09

Alvaro Fuentes


Similar answer to Alvaro Fuentes' in function form for more portability

import pandas as pd

def boxplot_sorted(df, by, column):
  df2 = pd.DataFrame({col:vals[column] for col, vals in df.groupby(by)})
  meds = df2.median().sort_values()
  df2[meds.index].boxplot(rot=90)

boxplot_sorted(df, by=["X", "Y"], column="Z")
like image 41
J Wang Avatar answered Sep 24 '22 17:09

J Wang