Using Seaborn, I can create boxplots of multiple columns of one pandas DataFrame
on the same figure. I would like to apply a custom style to the fliers (outliers), e.g. setting the marker symbol, color and marker size.
The API documentation on seaborn.boxplot, however, only provides an argument fliersize
which lets me control the size of the fliers but not the color and symbol.
Since Seaborn uses matplotlib for plotting, I thought I could provide a matplotlib styling dictionary to the boxplot
function like so:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# create a dataframe
df = pd.DataFrame({'column_a': [3, 6, 200, 100, 7], 'column_b': [1, 8, 4, 150, 290], 'column_c': [6, 7, 20, 80, 275]})
# set figure size
sns.set(rc={"figure.figsize": (14, 6)})
# define outlier properties
flierprops = dict(marker='o', markersize=5)
# create boxplot
ax = sns.boxplot(df, vert=False, showmeans=True, flierprops=flierprops)
plt.show()
Result:
According to the provided dictionary, I would expect a large red circle representing the flyer of
column_c
but instead, the standard settings are still used.
This thread describes a similar problem when matplotlib is used directly - however, from the discussion I guessed that this should be fixed meanwhile when using recent versions of matplotlib.
I tried this with an iPython notebook (iPython 3.10), matplotlib 1.4.3 and seaborn 0.5.1.
We can remove outliers in R by setting the outlier. shape argument to NA. In addition, the coord_cartesian() function will be used to reject all outliers that exceed or below a given quartile. The y-axis of ggplot2 is not automatically adjusted.
To remove the outliers from the chart, I have to specify the “showfliers” parameter and set it to false.
You can calculate it this way: Q1 = df.quartile(0.25) Q3 = df.quartile(0.75) IQR = Q3 - Q1. It's an outlier if it is less than: Q1 - 1.5 * IQR. or if it is greater than: Q3 + 1.5 * IQR. Follow this answer to receive notifications.
To only change the symbol, you can leave out flierprops= and just use sym='*' . If, on the contrary, you need more control or need their exact positions, the fliers are also returned by box = plt. boxplot(...) as box['fliers'] .
Seaborn's boxplot code ignores your flierprops argument and overwrites it with its own before passing arguments to Matplotlib's. Matplotlib's boxplot also returns all the flier objects as part of its return value, so you could modify this after running boxplot, but Seaborn doesn't return this.
The overwriting of flierprops (and sym) seems like a bug, so I'll see if I can fix it: see this issue. Meanwhile, you may want to consider using matplotlib's boxplot instead. Looking at seaborn's code may be useful (boxplot is in distributions.py).
Update: there is now a pull request that fixes this (flierprops
and other *props
, but not sym
)
flierprops = dict(marker='o', markerfacecolor='None', markersize=10, markeredgecolor='black')
sns.boxplot(y=df.Column,orient="v",flierprops=flierprops)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With