Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Equivalent to Matlab's svds(A, k) for large arrays?

Tags:

python

matlab

I'm trying to port some code from Matlab to Python and I've run into a problem. I can't seem to find an equivalent to svds.

I tried using numpy.corrcoef and then numpy.linalg.eig, but numpy.corrcoef doesn't work for large arrays (say 500 x 20000).

Here's the code in matlab, in case it makes any difference:

s = size(data, 2)
mean = sum(data, 2)/s
m_data = ( data - repmat(mean, 1, s) ) / sqrt(s - 1)
[res_u,res_s] = svds(m_data, s)
eigenvals = diag(res_s).^2
eigenvecs = res_u
like image 948
digdin Avatar asked Feb 11 '11 22:02

digdin


People also ask

Is Python faster than Matlab?

Matlab is faster than Python, but Python is better at running multiple jobs in parallel.

Which is faster NumPy or Matlab?

The code is almost the same, but the performance is very different. The time matlab takes to complete the task is 0.252454 seconds while numpy 0.973672151566, that is almost four times more.

Is NumPy similar to Matlab?

NumPy arrays are the equivalent to the basic array data structure in MATLAB. With NumPy arrays, you can do things like inner and outer products, transposition, and element-wise operations.


2 Answers

So what you are looking for, would be something like this in python and numpy (I took the liberty not to 'directly translate' the matlab-code to python and numpy, instead I refactored it a little bit to 'feel' more pythonic [of'course quite similar refactoring can be applied to the matlab-code as well]):

import numpy as np

def _cas(D):
    """Center at mean and standardize."""
    return (D- D.mean(1)[:, None])/ (D.shape[1]- 1)** .5

def example(D):
    """Eigenvalues and -vectors, based on SVD."""
    u, s, v= np.linalg.svd(D, full_matrices= False);
    return np.diag(s)** 2, u

if __name__ == '__main__':
    data= np.random.rand(5, 20)
    data= _cas(data) # preprocess data according to your requirements
    eigenvals, eigenvecs= example(data)
    print eigenvals
    print eigenvecs

But you have a performance problem with it?

Can you now be more specific of your current performance, and how much you really to expect it should be enhanced? FTIW in my modest computer, a random (500, 20000) matrix will spend executing example(.) some 20 secs.

I can trivially (due to basic linear algebra) reduce the execution time down to a level like 2.5 sec (allmost 10 fold improvement)! Now, if you are looking for much better performance than this, then please elaborate in much more detailed level on the 'nature' of your data!

Where is your data coming from? What is your specific case to utilize the calculated eigenvalues, and -vectors? i.e. what is your main goal to achieve?

like image 144
12 revs Avatar answered Sep 17 '22 18:09

12 revs


If your matrix is sparse, you can use scipy.sparse.linalg.svds in a recent scipy distribution. It wraps the sparse SVD routine in ARPACK. In my experience, it is buggy (not sure if this is the fault of scipy or ARPACK), so I advise running tests to check that the singular values are as expected.

If your matrix is dense, 500 by 20000 is big but not intractable on a commodity desktop these days. You are calling numpy.linalg.svd with full_matrices=False, right?

What are you using numpy.corrcoef for? Nothing like that appears in the MATLAB code snippet that you posted.

like image 43
chairmanK Avatar answered Sep 17 '22 18:09

chairmanK