Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to widen boxes in Seaborn boxplot?

I'm trying to make a grouped boxplot using Seaborn (Reference), and the boxes are all incredibly narrow -- too narrow to see the grouping colors.

g = seaborn.factorplot("project_code",y="num_mutations",hue="organ",
        data=grouped_donor, kind="box", aspect=3)

enter image description here

If I zoom in, or stretch the graphic several times the width of my screen, I can see the boxes, but obviously this isn't useful as a standard graphic.

This appears to be a function of my amount of data; if I plot only the first 500 points (of 6000), I get visible-but-small boxes. It might specifically be a function of the high variance of my data; according to the matplotlib boxplot documentation,

The default [width] is 0.5, or 0.15x(distance between extreme positions) if that is smaller.

Regardless of the reason, there's plenty of room on the graph itself for wider boxes, if I could just widen them.

Unfortunately, the boxplot keyword widths which controls the box width isn't a valid factorplot keyword, and I can't find a matplotlib function that'll change the width of a bar or box outside of the plotting function itself. I can't even find anyone discussing this; the closest I found was boxplot line width. Any suggestions?

like image 839
Lanthala Avatar asked Jun 26 '15 00:06

Lanthala


People also ask

What is hue in Seaborn boxplot?

In seaborn, the hue parameter determines which column in the data frame should be used for colour encoding. Using the official document for lmplot provided an example for this.

How does Seaborn boxplot work?

boxplot. Draw a box plot to show distributions with respect to categories. A box plot (or box-and-whisker plot) shows the distribution of quantitative data in a way that facilitates comparisons between variables or across levels of a categorical variable.

How does Seaborn boxplot determine outliers?

On their website seaborn. boxplot they simple state: The box shows the quartiles of the dataset while the whiskers extend to show the rest of the distribution, except for points that are determined to be “outliers” using a method that is a function of the inter-quartile range.


2 Answers

When sns.boxplot is used adding dodge=False will solve this problem as of version 0.9.

sns.factorplot() has been deprecated since version 0.9, and has been replaced with catplot() which also has the dodge parameter.

like image 162
ilyas Avatar answered Oct 02 '22 03:10

ilyas


For future reference, here are the relevant bits of code that make the correct figure with legend: (obviously this is missing important things and won't actually run as-is, but hopefully it shows the tricky parts)

import matplotlib.pylab as pyp
import seaborn as sns

def custom_legend(colors,labels, legend_location = 'upper left', legend_boundary = (1,1)):
    # Create custom legend for colors
    recs = []
    for i in range(0,len(colors)):
        recs.append(mpatches.Rectangle((0,0),1,1,fc=colors[i]))
    pyp.legend(recs,labels,loc=legend_location, bbox_to_anchor=legend_boundary)

# Color boxplots by organ
organ_list = sorted(df_unique(grouped_samples,'type'))
colors = sns.color_palette("Paired", len(organ_list))
color_dict = dict(zip(organ_list, colors))
organ_palette = grouped_samples.drop_duplicates('id')['type'].map(color_dict)

# Plot grouped boxplot
g = sns.factorplot("id","num_mutations",data=grouped_samples, order=id_list, kind="box", size=7, aspect=3, palette=organ_palette)
sns.despine(left=True)
plot_setup_pre()
pyp.yscale('log')
custom_legend(colors,organ_list)    
like image 33
Lanthala Avatar answered Oct 02 '22 02:10

Lanthala