I'm trying to make a grouped boxplot using Seaborn (Reference), and the boxes are all incredibly narrow -- too narrow to see the grouping colors.
g = seaborn.factorplot("project_code",y="num_mutations",hue="organ",
data=grouped_donor, kind="box", aspect=3)
If I zoom in, or stretch the graphic several times the width of my screen, I can see the boxes, but obviously this isn't useful as a standard graphic.
This appears to be a function of my amount of data; if I plot only the first 500 points (of 6000), I get visible-but-small boxes. It might specifically be a function of the high variance of my data; according to the matplotlib boxplot documentation,
The default [width] is 0.5, or 0.15x(distance between extreme positions) if that is smaller.
Regardless of the reason, there's plenty of room on the graph itself for wider boxes, if I could just widen them.
Unfortunately, the boxplot keyword widths
which controls the box width isn't a valid factorplot
keyword, and I can't find a matplotlib function that'll change the width of a bar or box outside of the plotting function itself. I can't even find anyone discussing this; the closest I found was boxplot line width. Any suggestions?
In seaborn, the hue parameter determines which column in the data frame should be used for colour encoding. Using the official document for lmplot provided an example for this.
boxplot. Draw a box plot to show distributions with respect to categories. A box plot (or box-and-whisker plot) shows the distribution of quantitative data in a way that facilitates comparisons between variables or across levels of a categorical variable.
On their website seaborn. boxplot they simple state: The box shows the quartiles of the dataset while the whiskers extend to show the rest of the distribution, except for points that are determined to be “outliers” using a method that is a function of the inter-quartile range.
When sns.boxplot
is used adding dodge=False
will solve this problem as of version 0.9.
sns.factorplot()
has been deprecated since version 0.9, and has been replaced with catplot()
which also has the dodge
parameter.
For future reference, here are the relevant bits of code that make the correct figure with legend: (obviously this is missing important things and won't actually run as-is, but hopefully it shows the tricky parts)
import matplotlib.pylab as pyp
import seaborn as sns
def custom_legend(colors,labels, legend_location = 'upper left', legend_boundary = (1,1)):
# Create custom legend for colors
recs = []
for i in range(0,len(colors)):
recs.append(mpatches.Rectangle((0,0),1,1,fc=colors[i]))
pyp.legend(recs,labels,loc=legend_location, bbox_to_anchor=legend_boundary)
# Color boxplots by organ
organ_list = sorted(df_unique(grouped_samples,'type'))
colors = sns.color_palette("Paired", len(organ_list))
color_dict = dict(zip(organ_list, colors))
organ_palette = grouped_samples.drop_duplicates('id')['type'].map(color_dict)
# Plot grouped boxplot
g = sns.factorplot("id","num_mutations",data=grouped_samples, order=id_list, kind="box", size=7, aspect=3, palette=organ_palette)
sns.despine(left=True)
plot_setup_pre()
pyp.yscale('log')
custom_legend(colors,organ_list)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With