Can scipy.stats identify and mask obvious outliers?

Tags:

With scipy.stats.linregress I am performing a simple linear regression on some sets of highly correlated x,y experimental data, and initially visually inspecting each x,y scatter plot for outliers. More generally (i.e. programmatically) is there a way to identify and mask outliers?

305

asked Apr 19 '12 15:04

a different ben

1 Answers

The statsmodels package has what you need. Look at this little code snippet and its output:

Click to copy

# Imports # import statsmodels.api as smapi import statsmodels.graphics as smgraphics # Make data # x = range(30) y = [y*10 for y in x] # Add outlier # x.insert(6,15) y.insert(6,220) # Make graph # regression = smapi.OLS(x, y).fit() figure = smgraphics.regressionplots.plot_fit(regression, 0) # Find outliers # test = regression.outlier_test() outliers = ((x[i],y[i]) for i,t in enumerate(test) if t[2] < 0.5) print 'Outliers: ', list(outliers)

Example figure 1

Outliers: [(15, 220)]

Edit

With the newer version of statsmodels, things have changed a bit. Here is a new code snippet that shows the same type of outlier detection.

Click to copy

# Imports # from random import random import statsmodels.api as smapi from statsmodels.formula.api import ols import statsmodels.graphics as smgraphics # Make data # x = range(30) y = [y*(10+random())+200 for y in x] # Add outlier # x.insert(6,15) y.insert(6,220) # Make fit # regression = ols("data ~ x", data=dict(data=y, x=x)).fit() # Find outliers # test = regression.outlier_test() outliers = ((x[i],y[i]) for i,t in enumerate(test.icol(2)) if t < 0.5) print 'Outliers: ', list(outliers) # Figure # figure = smgraphics.regressionplots.plot_fit(regression, 1) # Add line # smgraphics.regressionplots.abline_plot(model_results=regression, ax=figure.axes[0])

Example figure 2

Outliers: [(15, 220)]

110

answered Sep 17 '22 09:09

xApple

Related questions
                            
                                Removing Logging from Production Code in Android?
                            
                                Convert between text and varchar(MAX) in SQL Server
                            
                                Get Meteor collection by name
                            
                                Copy a directory using NSIS .
                            
                                How to find out which class I'm currently in in Pycharm?
                            
                                How to remove "fatal: loose object"?
                            
                                three-way color gradient fill in r
                            
                                Rails log too verbose
                            
                                Selenium Webdriver - click on hidden elements
                            
                                How do you make Git ignore spaces and tabs?
                            
                                How to override the queryset giving the filters in list_filter?
                            
                                Select NA in a data.table in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With