Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Setting flier (outlier) style in Seaborn boxplot is ignored

Using Seaborn, I can create boxplots of multiple columns of one pandas DataFrame on the same figure. I would like to apply a custom style to the fliers (outliers), e.g. setting the marker symbol, color and marker size.

The API documentation on seaborn.boxplot, however, only provides an argument fliersize which lets me control the size of the fliers but not the color and symbol.

Since Seaborn uses matplotlib for plotting, I thought I could provide a matplotlib styling dictionary to the boxplot function like so:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# create a dataframe
df = pd.DataFrame({'column_a': [3, 6, 200, 100, 7], 'column_b': [1, 8, 4, 150, 290], 'column_c': [6, 7, 20, 80, 275]})

# set figure size
sns.set(rc={"figure.figsize": (14, 6)})

# define outlier properties
flierprops = dict(marker='o', markersize=5)

# create boxplot
ax = sns.boxplot(df, vert=False, showmeans=True, flierprops=flierprops)
plt.show()

Result:

Boxplot According to the provided dictionary, I would expect a large red circle representing the flyer of column_c but instead, the standard settings are still used.

This thread describes a similar problem when matplotlib is used directly - however, from the discussion I guessed that this should be fixed meanwhile when using recent versions of matplotlib.

I tried this with an iPython notebook (iPython 3.10), matplotlib 1.4.3 and seaborn 0.5.1.

like image 828
Dirk Avatar asked Apr 15 '15 09:04

Dirk


People also ask

How do you ignore outliers on a boxplot?

We can remove outliers in R by setting the outlier. shape argument to NA. In addition, the coord_cartesian() function will be used to reject all outliers that exceed or below a given quartile. The y-axis of ggplot2 is not automatically adjusted.

How do you avoid outliers in Seaborn boxplot?

To remove the outliers from the chart, I have to specify the “showfliers” parameter and set it to false.

How is an outlier defined in Seaborn box plots?

You can calculate it this way: Q1 = df.quartile(0.25) Q3 = df.quartile(0.75) IQR = Q3 - Q1. It's an outlier if it is less than: Q1 - 1.5 * IQR. or if it is greater than: Q3 + 1.5 * IQR. Follow this answer to receive notifications.

How do you change the outlier symbol on a boxplot?

To only change the symbol, you can leave out flierprops= and just use sym='*' . If, on the contrary, you need more control or need their exact positions, the fliers are also returned by box = plt. boxplot(...) as box['fliers'] .


2 Answers

Seaborn's boxplot code ignores your flierprops argument and overwrites it with its own before passing arguments to Matplotlib's. Matplotlib's boxplot also returns all the flier objects as part of its return value, so you could modify this after running boxplot, but Seaborn doesn't return this.

The overwriting of flierprops (and sym) seems like a bug, so I'll see if I can fix it: see this issue. Meanwhile, you may want to consider using matplotlib's boxplot instead. Looking at seaborn's code may be useful (boxplot is in distributions.py).


Update: there is now a pull request that fixes this (flierprops and other *props, but not sym)

like image 198
cge Avatar answered Oct 12 '22 21:10

cge


flierprops = dict(marker='o', markerfacecolor='None', markersize=10,  markeredgecolor='black')
sns.boxplot(y=df.Column,orient="v",flierprops=flierprops)
like image 42
Soubhik Sarkar Avatar answered Oct 12 '22 23:10

Soubhik Sarkar