Why is numpy.linalg.pinv() preferred over numpy.linalg.inv() for creating inverse of a matrix in linear regression

Tags:

If we want to search for the optimal parameters theta for a linear regression model by using the normal equation with:

theta = inv(X^T * X) * X^T * y

one step is to calculate inv(X^T*X). Therefore numpy provides np.linalg.inv() and np.linalg.pinv()

Though this leads to different results:

X=np.matrix([[1,2104,5,1,45],[1,1416,3,2,40],[1,1534,3,2,30],[1,852,2,1,36]])
y=np.matrix([[460],[232],[315],[178]])

XT=X.T
XTX=XT@X

pinv=np.linalg.pinv(XTX)
theta_pinv=(pinv@XT)@y
print(theta_pinv)

[[188.40031946]
 [  0.3866255 ]
 [-56.13824955]
 [-92.9672536 ]
 [ -3.73781915]]

inv=np.linalg.inv(XTX)
theta_inv=(inv@XT)@y
print(theta_inv)

[[-648.7890625 ]
 [   0.79418945]
 [-110.09375   ]
 [ -74.0703125 ]
 [  -3.69091797]]

The first output, that is the output of pinv is the correct one and additionally recommended in the numpy.linalg.pinv() docs. But why is this and where are the differences / Pros / Cons between inv() and pinv().

249

asked Mar 19 '18 07:03

2Obe

2 Answers

If the determinant of the matrix is zero it will not have an inverse and your inv function will not work. This usually happens if your matrix is singular.

But pinv will. This is because pinv returns the inverse of your matrix when it is available and the pseudo inverse when it isn't.

The different results of the functions are because of rounding errors in floating point arithmetic

You can read more about how pseudo inverse works here

152

answered Sep 17 '22 07:09

Vedant Shetty

inv and pinv are used to compute the (pseudo)-inverse as a standalone matrix. Not to actually use them in the computations.

For such linear system solutions the proper tool to use is numpy.linalg.lstsq (or from scipy) if you have a non invertible coefficient matrix or numpy.linalg.solve (or from scipy) for invertible matrices.

answered Sep 17 '22 07:09

percusse

Related questions
                            
                                Equal Error Rate in Python
                            
                                How to list all unused jenkins plugins?
                            
                                Python, how to enable all warnings?
                            
                                Can't open video using opencv
                            
                                Django: show the count of related objects in admin list_display
                            
                                OSError: dlopen(libSystem.dylib, 6): image not found
                            
                                How to get boxplot data for matplotlib boxplots
                            
                                Does GridSearchCV store all the scores for all parameter combinations?
                            
                                Django and 'virtualenv' - proper project structure
                            
                                Subprocess timeout failure
                            
                                Add a new sheet to a existing workbook in python
                            
                                How to generate a unique auth token in python?
                            
                                Why is Collections.counter so slow?
                            
                                Retry function in Python
                            
                                Rename nested field in spark dataframe
                            
                                Weighted random sample without replacement in python
                            
                                Complete search algorithm for combinations of coins
                            
                                How to update plot title with matplotlib using animation?
                            
                                python pandas.Series.isin with case insensitive
                            
                                pytesseract using tesseract 4.0 numbers only not working

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is numpy.linalg.pinv() preferred over numpy.linalg.inv() for creating inverse of a matrix in linear regression

Tags:

python

matrix

numpy

linear-regression

linear-algebra

2Obe

People also ask

2 Answers

Vedant Shetty

percusse

Recent Activity

Donate For Us