I have a large NxN dense symmetric matrix and want the eigenvectors corresponding to the k largest eigenvalues. What's the best way to find them (preferably using numpy but perhaps in general using blas/atlas/lapack if that's the only way to go)? In general N is much much larger then k (say N > 5000, k < 10).
Numpy seems to only have functions for finding the k largest eigenvalues if my starting matrix is sparse.
In SciPy, you can use the linalg.eigh function, with the eigvals
parameter.
eigvals : tuple (lo, hi) Indexes of the smallest and largest (in ascending order) eigenvalues and corresponding eigenvectors to be returned: 0 <= lo < hi <= M-1. If omitted, all eigenvalues and eigenvectors are returned.
Which in your case should be set to (N-k,N-1)
.
Actually the sparse routines work for dense numpy arrays also, I think they use some kind of Krylov subspace iteration, therefore they need to compute several matrix-vector products, which means that if your k << N, the sparse routinescould be (marginally?) faster.
Check out the docs http://docs.scipy.org/doc/scipy/reference/tutorial/arpack.html
and the following code (go to take a good coffee with friends until it ends)
import numpy as np
from time import clock
from scipy.linalg import eigh as largest_eigh
from scipy.sparse.linalg.eigen.arpack import eigsh as largest_eigsh
np.set_printoptions(suppress=True)
np.random.seed(0)
N=5000
k=10
X = np.random.random((N,N)) - 0.5
X = np.dot(X, X.T) #create a symmetric matrix
# Benchmark the dense routine
start = clock()
evals_large, evecs_large = largest_eigh(X, eigvals=(N-k,N-1))
elapsed = (clock() - start)
print "eigh elapsed time: ", elapsed
# Benchmark the sparse routine
start = clock()
evals_large_sparse, evecs_large_sparse = largest_eigsh(X, k, which='LM')
elapsed = (clock() - start)
print "eigsh elapsed time: ", elapsed
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With