I want to calculate the least squares estimate for given data. There are a few ways to do this, one is to use numpy's least squares: <pre class="prettyprint"><code>import numpy np.linalg.lstsq(X,y)[0] </code></pre> Where X is a matrix and y a vector of compatible dimension (type float64). Second way is to calculate the result directly using the formula: <pre class="prettyprint"><code>import numpy numpy.linalg.inv(X.T.dot(X)).dot(X.T).dot(y) </code></pre> My problem: there are cases where the different formulas give radically different results (although there may be no difference). Sometimes the coefficients grow to be extremely large, using one formula, while the other is much more well behaved. The formulas are the same so why can the results diverge so much? Is this some type of rounding error and how do I minimize it?

While those two formulas are mathematically equivalent, they are not numerically equivalent! There are better ways to solve a system of linear equations Ax = b than by multiplying both sides by A^(-1), like Gaussian Elimination. <code>numpy.linalg.lstsq</code> uses this (and more sophisticated) methods to solve the underlying linear system, plus it can handle a lot of corner cases. So use it when you can. Matrix inversion is very numerically unstable. Don't do it unless you have to.

Why does numpy least squares result diverge from using the direct formula?

Tags:

python

numpy

I want to calculate the least squares estimate for given data.

There are a few ways to do this, one is to use numpy's least squares:

Click to copy

import numpy
np.linalg.lstsq(X,y)[0]

Where X is a matrix and y a vector of compatible dimension (type float64). Second way is to calculate the result directly using the formula:

Click to copy

import numpy
numpy.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)

My problem: there are cases where the different formulas give radically different results (although there may be no difference). Sometimes the coefficients grow to be extremely large, using one formula, while the other is much more well behaved. The formulas are the same so why can the results diverge so much? Is this some type of rounding error and how do I minimize it?

703

asked Jul 29 '16 00:07

Dole

1 Answers

While those two formulas are mathematically equivalent, they are not numerically equivalent! There are better ways to solve a system of linear equations Ax = b than by multiplying both sides by A^(-1), like Gaussian Elimination. numpy.linalg.lstsq uses this (and more sophisticated) methods to solve the underlying linear system, plus it can handle a lot of corner cases. So use it when you can.

Matrix inversion is very numerically unstable. Don't do it unless you have to.

168

answered Oct 13 '22 19:10

bpachev

Related questions
                            
                                python sqlalchemy distinct column values
                            
                                Is there a good way to share the seed of random between modules (in python)?
                            
                                Unable to write my dataframe using feather (strided data not supported)
                            
                                How to call excel add-ins from python
                            
                                Send Video over TCP using OpenCV and sockets in Raspberry Pi
                            
                                Run model in reverse in Keras
                            
                                Django Rest Framework : How to initialise & use custom exception handler?
                            
                                I installed both python 2.7 and 3.5, but python 3.5 does not work well
                            
                                Dig into variables while debugging
                            
                                How do I beautify (auto-format) a piece of code in Spyder IDE
                            
                                Drawing an antialiased circle as described by Xaolin Wu
                            
                                How do Python Recursive Generators work?
                            
                                How to generate permutations of a list without “moving” zeros. in Python
                            
                                Anaconda ipython qtconsole launcher
                            
                                How do I integrate travis ci with codeclimate test coverage in Python?
                            
                                Checking out a branch with GitPython. making commits, and then switching back to the previous branch
                            
                                Stream Output of Predictions in Keras
                            
                                django db_index for foreign key reverse lookup
                            
                                How to loop through multiple cells in Jupyter / iPython Notebook [duplicate]
                            
                                Find duplicates in python list of dictionaries

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With