Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract outliers from Seaborn Boxplot

Is there a way to extract all outliers after plotting a Seaborn Boxplot? For example, if I am plotting a boxplot for the below data

      client                total
1      LA                     1
2      Sultan                128
3      ElderCare              1
4      CA                     3
5      More                  900

I want to see the below records returned as outliers after the boxplot is plotted.

2      Sultan                128
5      More                  900
like image 981
Aaron Avatar asked Dec 12 '18 03:12

Aaron


1 Answers

Seaborn uses matplotlib to handle outlier calculations, meaning the key parameter, whis, is passed onto ax.boxplot. The specific function taking care of the calculation is documented here: https://matplotlib.org/api/cbook_api.html#matplotlib.cbook.boxplot_stats. You can use matplotlib.cbook.boxplot_stats to calculate rather than extract outliers. The follow code snippet shows you the calculation and how it is the same as the seaborn plot:

import matplotlib.pyplot as plt
from matplotlib.cbook import boxplot_stats
import pandas as pd
import seaborn as sns

data = [
    ('LA', 1),
    ('Sultan', 128),
    ('ElderCare', 1),
    ('CA', 3),
    ('More', 900),
]
df = pd.DataFrame(data, columns=('client', 'total'))
ax = sns.boxplot(data=df)
outliers = [y for stat in boxplot_stats(df['total']) for y in stat['fliers']]
print(outliers)
for y in outliers:
    ax.plot(1, y, 'p')
ax.set_xlim(right=1.5)
plt.show()

enter image description here

like image 54
Y. Luo Avatar answered Oct 01 '22 11:10

Y. Luo