Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to compute k largest eigenvalues and corresponding eigenvectors with numpy

I have a large NxN dense symmetric matrix and want the eigenvectors corresponding to the k largest eigenvalues. What's the best way to find them (preferably using numpy but perhaps in general using blas/atlas/lapack if that's the only way to go)? In general N is much much larger then k (say N > 5000, k < 10).

Numpy seems to only have functions for finding the k largest eigenvalues if my starting matrix is sparse.

like image 806
Anthony Bak Avatar asked Aug 28 '12 21:08

Anthony Bak


2 Answers

In SciPy, you can use the linalg.eigh function, with the eigvals parameter.

eigvals : tuple (lo, hi) Indexes of the smallest and largest (in ascending order) eigenvalues and corresponding eigenvectors to be returned: 0 <= lo < hi <= M-1. If omitted, all eigenvalues and eigenvectors are returned.

Which in your case should be set to (N-k,N-1).

like image 125
Andy Hayden Avatar answered Oct 17 '22 06:10

Andy Hayden


Actually the sparse routines work for dense numpy arrays also, I think they use some kind of Krylov subspace iteration, therefore they need to compute several matrix-vector products, which means that if your k << N, the sparse routinescould be (marginally?) faster.

Check out the docs http://docs.scipy.org/doc/scipy/reference/tutorial/arpack.html

and the following code (go to take a good coffee with friends until it ends)

import numpy as np
from time import clock
from scipy.linalg import eigh as largest_eigh
from scipy.sparse.linalg.eigen.arpack import eigsh as largest_eigsh

np.set_printoptions(suppress=True)
np.random.seed(0)
N=5000
k=10
X = np.random.random((N,N)) - 0.5
X = np.dot(X, X.T) #create a symmetric matrix

# Benchmark the dense routine
start = clock()
evals_large, evecs_large = largest_eigh(X, eigvals=(N-k,N-1))
elapsed = (clock() - start)
print "eigh elapsed time: ", elapsed

# Benchmark the sparse routine
start = clock()
evals_large_sparse, evecs_large_sparse = largest_eigsh(X, k, which='LM')
elapsed = (clock() - start)
print "eigsh elapsed time: ", elapsed
like image 6
Giuliano Avatar answered Oct 17 '22 05:10

Giuliano