Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Definition of standard error in scipy.stats.linregress

Tags:

python

scipy

I'm using the scipy.stats.linregress function to do a simple linear regression on some 2D data, e.g.:

from scipy import stats
x = [5.05, 6.75, 3.21, 2.66]
y = [1.65, 26.5, -5.93, 7.96]
gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y)

The documentation on the function states that std_err is the:

Standard error of the estimate

I'm not sure what this means. This old answer says that it represents the "standard error of the gradient line" but that this "was not always the behaviour of this library".

Could I get a precise definition of what exactly this parameter represent?

like image 522
Gabriel Avatar asked Jul 16 '15 13:07

Gabriel


2 Answers

As of Dec 2016, I think that it's still showing the standard error of the slope of the OLS regression line. I calculated the regression of some datasets using orthogonal distance regression as part of the scipy package, and the output's sd_beta[1] (representative of the standard error of the slope of the regression line) was very similar to the std_err as calculated by scipy.stats.linregress.

like image 193
spacetyper Avatar answered Oct 30 '22 01:10

spacetyper


This is a standard measure in statistics. See wikipedia for a description of how to compute it. Unfortunately, stackoverflow does not seem to have LaTeX support, so it does not make sense to write out and explain the equations here.

Essentially, std_err should give a value for each coefficient represented in the gradient. In simple terms std_err tells you how good of a fit the gradient is (higher values mean less precise) for your data.

Other useful answers on stats.stackexchange sites are here and here.

like image 30
James Pringle Avatar answered Oct 30 '22 01:10

James Pringle