Difference(s) between scipy.stats.linregress, numpy.polynomial.polynomial.polyfit and statsmodels.api.OLS

Tags:

It seems all three functions can do simple linear regression, e.g.

scipy.stats.linregress(x, y)

numpy.polynomial.polynomial.polyfit(x, y, 1)

x = statsmodels.api.add_constant(x)
statsmodels.api.OLS(y, x)

I wonder if there is any real difference between the three methods? I know that statsmodels are built on top of scipy, and scipy is kinda dependent on numpy for many things, so I expect that they should not differ vastly, but devil is always in the details.

More specifically, if we use the numpy method above, how do we get the p-value of the slope which is given by default by the other two methods?

I am using them in Python 3, if that makes any difference.

993

asked Jun 29 '15 22:06

MLister

1 Answers

The three are very different but overlap in the parameter estimation for the very simple example with only one explanatory variable.

By increasing generality:

scipy.stats.linregress only handles the case of a single explanatory variable with specialized code and calculates a few extra statistics.

numpy.polynomial.polynomial.polyfit estimates the regression for a polynomial of a single variable, but doesn't return much in terms of extra statisics.

statsmodels OLS is a generic linear model (OLS) estimation class. It doesn't prespecify what the explanatory variables are and can handle any multivariate array of explanatory variables, or formulas and pandas DataFrames. It not only returns the estimated parameters, but also a large set of results staistics and methods for statistical inference and prediction.

For completeness of options for estimating linear models in Python (outside of Bayesian analysis), we should also consider scikit-learn LinearRegression and similar linear models, which are useful for selecting among a large number of explanatory variables but does not have the large number of results that statsmodels provides.

138

answered Sep 21 '22 19:09

Josef

Related questions
                            
                                Fastest method to generate big random string with lower Latin letters
                            
                                Python numpy.random.normal only positive values
                            
                                Python - Removing overlapping lists
                            
                                Get indices that satisfy some criteria
                            
                                re.search Multiple lines Python
                            
                                What is the difference between `super(...)` and `return super(...)`?
                            
                                How to resolve 'str' has no attribute 'maketrans' error in python?
                            
                                Basics of Simulated Annealing in Python [closed]
                            
                                Deprecation warning in scikit-learn svmlight format loader
                            
                                Is it reliable to compare two isoformat datetime strings?
                            
                                Matrix multiplication on CPU (numpy) and GPU (gnumpy) give different results
                            
                                Python path as a string [closed]
                            
                                Stuffing a pandas DataFrame.plot into a matplotlib subplot
                            
                                Memory-aware LRU caching in Python?
                            
                                Pandas - Delete Rows with only NaN values
                            
                                Python AttributeError: 'module' object has no attribute 'connect'
                            
                                Datetime Timezone conversion using pytz
                            
                                Regex, select closest match
                            
                                How can I share a class between processes?
                            
                                How do you add error bars to Bokeh plots in python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Difference(s) between scipy.stats.linregress, numpy.polynomial.polynomial.polyfit and statsmodels.api.OLS

Tags:

python

python-3.x

numpy

scipy

statsmodels

MLister

People also ask

1 Answers

Josef

Recent Activity

Donate For Us