I would like to compare a set of distributions of scores (<code>score</code>), grouped by some categories (<code>centrality</code>) and colored by some other (<code>model</code>). I've tried the following with seaborn: <pre class="prettyprint"><code>plt.figure(figsize=(14,6)) seaborn.boxplot(x="centrality", y="score", hue="model", data=data, palette=seaborn.color_palette("husl", len(models) +1)) seaborn.despine(offset=10, trim=True) plt.savefig("/home/i11/staudt/Eval/properties-replication-test.pdf", bbox_inches="tight") </code></pre> There are some problems I have with this plot: <ul> <li>There is a large amount of outliers and I don't like how they are drawn here. Can I remove them? Can I change the appearance to show less clutter? Can I color them at least so that their color matches the box color?</li> <li>The <code>model</code> value <code>original</code> is special because all other distributions should be compared to the distribution of <code>original</code>. This should be visually reflected in the plot. Can I make <code>original</code> the first box of every group? Can I offset or mark it differently somehow? Would it be possible to draw a horizontal line through the median of each <code>original</code> distribution and through the group of boxes?</li> <li>some of the values of <code>score</code> are very small, how to do proper scaling of the y-axis to show them?</li> </ul> <img src="https://i.stack.imgur.com/h5JSC.png" alt="enter image description here"> EDIT: Here is an example with a log-scaled y-axis - also not yet ideal. Why do the some boxes seem cut off at the low end? <img src="https://i.stack.imgur.com/K4wOO.png" alt="enter image description here">

Outlier display You should be able to pass any arguments to <code>seaborn.boxplot</code> that you can pass to <code>plt.boxplot</code> (see documentation), so you could adjust the display of the outliers by setting <code>flierprops</code>. Here are some examples of what you can do with your outliers. If you don't want to display them, you could do <pre class="prettyprint"><code>seaborn.boxplot(x="centrality", y="score", hue="model", data=data, showfliers=False) </code></pre> or you could make them light gray like so: <pre class="prettyprint"><code>flierprops = dict(markerfacecolor='0.75', markersize=5, linestyle='none') seaborn.boxplot(x="centrality", y="score", hue="model", data=data, flierprops=flierprops) </code></pre> Order of groups You can set the order of the groups manually with <code>hue_order</code>, e.g. <pre class="prettyprint"><code>seaborn.boxplot(x="centrality", y="score", hue="model", data=data, hue_order=["original", "Havel..","etc"]) </code></pre> Scaling of y-axis You could just get the minimum and maximum values of all y-values and set <code>y_lim</code> accordingly? Something like this: <pre class="prettyprint"><code>y_values = data["scores"].values seaborn.boxplot(x="centrality", y="score", hue="model", data=data, y_lim=(np.min(y_values),np.max(y_values))) </code></pre> EDIT: This last point doesn't really make sense since the automatic <code>y_lim</code> range will already include all the values, but I'm leaving it just as an example of how to adjust these settings. As mentioned in the comments, log-scaling probably makes more sense.

It has been a while since this answer has activity, but I'll answer OP's question regarding the weird looking lower-bounds for any people that need help in the future. Once you set your y-axis to logarithmic scale, it becomes impossible to represent y=0, since log(0) tends to -inf. Therefore, when the values regarding the lower part of your boxplot are either zero or very close to it the box has that look of seeming to be 'cut in half'. Needless to say that it's also impossible to represent negative y values in a logarithmic scale.

Tweaking seaborn.boxplot

Tags:

python

matplotlib

plot

seaborn

boxplot

I would like to compare a set of distributions of scores (score), grouped by some categories (centrality) and colored by some other (model). I've tried the following with seaborn:

plt.figure(figsize=(14,6)) seaborn.boxplot(x="centrality", y="score", hue="model", data=data, palette=seaborn.color_palette("husl", len(models) +1)) seaborn.despine(offset=10, trim=True) plt.savefig("/home/i11/staudt/Eval/properties-replication-test.pdf", bbox_inches="tight")

There are some problems I have with this plot:

There is a large amount of outliers and I don't like how they are drawn here. Can I remove them? Can I change the appearance to show less clutter? Can I color them at least so that their color matches the box color?
The model value original is special because all other distributions should be compared to the distribution of original. This should be visually reflected in the plot. Can I make original the first box of every group? Can I offset or mark it differently somehow? Would it be possible to draw a horizontal line through the median of each original distribution and through the group of boxes?
some of the values of score are very small, how to do proper scaling of the y-axis to show them?

enter image description here

EDIT:

Here is an example with a log-scaled y-axis - also not yet ideal. Why do the some boxes seem cut off at the low end?

enter image description here

599

asked Feb 01 '16 13:02

clstaudt

2 Answers

Outlier display

You should be able to pass any arguments to seaborn.boxplot that you can pass to plt.boxplot (see documentation), so you could adjust the display of the outliers by setting flierprops. Here are some examples of what you can do with your outliers.

If you don't want to display them, you could do

seaborn.boxplot(x="centrality", y="score", hue="model", data=data,                 showfliers=False)

or you could make them light gray like so:

flierprops = dict(markerfacecolor='0.75', markersize=5,               linestyle='none') seaborn.boxplot(x="centrality", y="score", hue="model", data=data,                 flierprops=flierprops)

Order of groups

You can set the order of the groups manually with hue_order, e.g.

seaborn.boxplot(x="centrality", y="score", hue="model", data=data,                 hue_order=["original", "Havel..","etc"])

Scaling of y-axis

You could just get the minimum and maximum values of all y-values and set y_lim accordingly? Something like this:

y_values = data["scores"].values seaborn.boxplot(x="centrality", y="score", hue="model", data=data,                 y_lim=(np.min(y_values),np.max(y_values)))

EDIT: This last point doesn't really make sense since the automatic y_lim range will already include all the values, but I'm leaving it just as an example of how to adjust these settings. As mentioned in the comments, log-scaling probably makes more sense.

160

answered Oct 05 '22 16:10

Lisa

It has been a while since this answer has activity, but I'll answer OP's question regarding the weird looking lower-bounds for any people that need help in the future.

Once you set your y-axis to logarithmic scale, it becomes impossible to represent y=0, since log(0) tends to -inf.

Therefore, when the values regarding the lower part of your boxplot are either zero or very close to it the box has that look of seeming to be 'cut in half'.

Needless to say that it's also impossible to represent negative y values in a logarithmic scale.

answered Oct 05 '22 15:10

André Cavalheiro

Related questions
                            
                                Proxy awareness with pip
                            
                                Flask-Session extension vs default session
                            
                                Python & Pandas - Group by day and count for each day
                            
                                Matplotlib custom marker/symbol
                            
                                How do I get the return value when using Python exec on the code object of a function?
                            
                                Delete a key and value from an OrderedDict
                            
                                Python3 - Is there a way to iterate row by row over a very large SQlite table without loading the entire table into local memory?
                            
                                How to open an .npz file
                            
                                "This constructor takes no arguments" error in __init__
                            
                                How do I change the background of a Frame in Tkinter?
                            
                                How to freeze entire header row in openpyxl?
                            
                                Best way to access the Nth line of csv file
                            
                                Noun phrases with spacy
                            
                                How to match any string from a list of strings in regular expressions in python?
                            
                                Python regular expressions - how to capture multiple groups from a wildcard expression?
                            
                                Random Python dictionary key, weighted by values
                            
                                How to create the union of many sets using a generator expression?
                            
                                Django model field by variable
                            
                                Estimate Autocorrelation using Python
                            
                                Django delete unused media files

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With