EDIT: this question arose back in 2013 with pandas ~0.13 and was obsoleted by direct support for boxplot somewhere between version 0.15-0.18 (as per @Cireo's late answer; also pandas greatly improved support for categorical since this was asked.)
I can get a boxplot
of a salary column in a pandas DataFrame...
train.boxplot(column='Salary', by='Category', sym='')
...however I can't figure out how to define the index-order used on column 'Category' - I want to supply my own custom order, according to another criterion:
category_order_by_mean_salary = train.groupby('Category')['Salary'].mean().order().keys()
How can I apply my custom column order to the boxplot columns? (other than ugly kludging the column names with a prefix to force ordering)
'Category' is a string (really, should be a categorical, but this was back in 0.13, where categorical was a third-class citizen) column taking 27 distinct values: ['Accounting & Finance Jobs','Admin Jobs',...,'Travel Jobs']
. So it can be easily factorized with pd.Categorical.from_array()
On inspection, the limitation is inside pandas.tools.plotting.py:boxplot()
, which converts the column object without allowing ordering:
I suppose I could either hack up a custom version of pandas boxplot(), or reach into the internals of the object. And also file an enhance request.
conf_intervals : This parameter is also an array or sequence whose first dimension is compatible with x and whose second dimension is 2. positions : This parameter is used to sets the positions of the boxes.
To draw a box plot for the given data first we need to arrange the data in ascending order and then find the minimum, first quartile, median, third quartile and the maximum. To find the First Quartile we take the first six values and find their median. For the Third Quartile, we take the next six and find their median.
Hard to say how to do this without a working example. My first guess would be to just add an integer column with the orders that you want.
A simple, brute-force way would be to add each boxplot one at a time.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.rand(37,4), columns=list('ABCD'))
columns_my_order = ['C', 'A', 'D', 'B']
fig, ax = plt.subplots()
for position, column in enumerate(columns_my_order):
ax.boxplot(df[column], positions=[position])
ax.set_xticks(range(position+1))
ax.set_xticklabels(columns_my_order)
ax.set_xlim(xmin=-0.5)
plt.show()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With