I found that there're two versions of pinv()
function, which calculates the pseudo-inverse of a matrix in Scipy
and numpy
, the documents can be viewed at:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.pinv.html
http://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.pinv.html
The problem is that I have a 50000*5000 matrix, when using scipy.linalg.pinv
, it costs me more than 20GB of memory. But when I use numpy.linalg.pinv
, only less than 1GB of memory is used..
I was wondering why numpy
and scipy
both have a pinv
under different implemention. And why their performances are so different.
I can't speak as to why there are implementations in both scipy and numpy, but I can explain why the behaviour is different.
numpy.linalg.pinv
approximates the Moore-Penrose psuedo inverse using an SVD (the lapack method dgesdd
to be precise), whereas scipy.linalg.pinv
solves a model linear system in the least squares sense to approximate the pseudo inverse (using dgelss
). This is why their performance is different. I would expect the overall accuracy of the resulting pseudo inverse estimates to be somewhat different as well.
You might find that scipy.linalg.pinv2
performs more similarly to numpy.linalg.pinv
, as it too uses an SVD method, rather than a least sqaures approximation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With