I'm trying to port some code from Matlab to Python and I've run into a problem. I can't seem to find an equivalent to svds.
I tried using numpy.corrcoef and then numpy.linalg.eig, but numpy.corrcoef doesn't work for large arrays (say 500 x 20000).
Here's the code in matlab, in case it makes any difference:
s = size(data, 2)
mean = sum(data, 2)/s
m_data = ( data - repmat(mean, 1, s) ) / sqrt(s - 1)
[res_u,res_s] = svds(m_data, s)
eigenvals = diag(res_s).^2
eigenvecs = res_u
Matlab is faster than Python, but Python is better at running multiple jobs in parallel.
The code is almost the same, but the performance is very different. The time matlab takes to complete the task is 0.252454 seconds while numpy 0.973672151566, that is almost four times more.
NumPy arrays are the equivalent to the basic array data structure in MATLAB. With NumPy arrays, you can do things like inner and outer products, transposition, and element-wise operations.
So what you are looking for, would be something like this in python
and numpy
(I took the liberty not to 'directly translate' the matlab
-code to python
and numpy
, instead I refactored it a little bit to 'feel' more pythonic
[of'course quite similar refactoring can be applied to the matlab
-code as well]):
import numpy as np
def _cas(D):
"""Center at mean and standardize."""
return (D- D.mean(1)[:, None])/ (D.shape[1]- 1)** .5
def example(D):
"""Eigenvalues and -vectors, based on SVD."""
u, s, v= np.linalg.svd(D, full_matrices= False);
return np.diag(s)** 2, u
if __name__ == '__main__':
data= np.random.rand(5, 20)
data= _cas(data) # preprocess data according to your requirements
eigenvals, eigenvecs= example(data)
print eigenvals
print eigenvecs
But you have a performance problem with it?
Can you now be more specific of your current performance, and how much you really to expect it should be enhanced? FTIW in my modest computer, a random (500, 20000) matrix will spend executing example(.)
some 20 secs.
I can trivially (due to basic linear algebra) reduce the execution time down to a level like 2.5 sec (allmost 10 fold improvement)! Now, if you are looking for much better performance than this, then please elaborate in much more detailed level on the 'nature' of your data
!
Where is your data coming from? What is your specific case to utilize the calculated eigenvalues, and -vectors? i.e. what is your main goal to achieve?
If your matrix is sparse, you can use scipy.sparse.linalg.svds
in a recent scipy distribution. It wraps the sparse SVD routine in ARPACK. In my experience, it is buggy (not sure if this is the fault of scipy or ARPACK), so I advise running tests to check that the singular values are as expected.
If your matrix is dense, 500 by 20000 is big but not intractable on a commodity desktop these days. You are calling numpy.linalg.svd
with full_matrices=False
, right?
What are you using numpy.corrcoef
for? Nothing like that appears in the MATLAB code snippet that you posted.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With