I'm trying to make a single boxplot chart area per month with different boxplots grouped by (and labeled) by industry and then have the Y-axis use a scale I dictate.
In a perfect world this would be dynamic and I could set the axis to be a certain number of standard deviations from the overall mean. I could live with another type of dynamically setting the y axis but I would want it to be standard on all the 'monthly' grouped boxplots created. I don't know what the best way to handle this is yet and open to wisdom - all I know is the numbers being used now are way to large for the charts to be meaningful.
I've tried all kinds of code and had zero luck with the scaling of axis and the code below was as close as I could come to the graph.
Here's a link to some dummy data: https://drive.google.com/open?id=0B4xdnV0LFZI1MmlFcTBweW82V0k
And for the code I'm using Python 3.5:
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
matplotlib.use('TkAgg')
import pylab
df = pd.read_csv('Query_Final_2.csv')
df['Ship_Date'] = pd.to_datetime(df['Ship_Date'], errors = 'coerce')
df1 = (df.groupby('Industry'))
print(
df1.boxplot(column='Gross_Margin',layout=(1,9), figsize=(20,10), whis=[5,95])
,pylab.show()
)
Using boxplot(), draw a box plot to show distributions with respect to categories. To set the range of Y-axis, use the ylim() method. To display the figure, use the show() method.
Steps. Set the figure size and adjust the padding between and around the subplots. Make a Pandas dataframe, i.e., two-dimensional, size-mutable, potentially heterogeneous tabular data. Make a box and whisker plot, using boxplot() method with width tuple to adjust the box in boxplot.
Customizing Box PlotThe notch = True attribute creates the notch format to the box plot, patch_artist = True fills the boxplot with colors, we can set different colors to different boxes. The vert = 0 attribute creates horizontal box plot. labels takes same dimensions as the number data sets.
Here is a cleaned up version of your code with the solution:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('Query_Final_2.csv')
df['Ship_Date'] = pd.to_datetime(df['Ship_Date'], errors = 'coerce')
df1 = df.groupby('Industry')
axes = df1.boxplot(column='Gross_Margin',layout=(1,9), figsize=(20,10),
whis=[5,95], return_type='axes')
for ax in axes.values():
ax.set_ylim(-2.5, 2.5)
plt.show()
The key is to return the subplots as axes objects and set the limits individually.
Once you have established variables for the mean and the standard deviation, use:
plt.ylim(ymin, ymax)
to set the y-axis.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With