I'm inverting covariance matrices with numpy
in python. Covariance matrices are symmetric and positive semi-definite.
I wondered if there exists an algorithm optimised for symmetric positive semi-definite matrices, faster than numpy.linalg.inv()
(and of course if an implementation of it is readily accessible from python!). I did not manage to find something in numpy.linalg
or searching the web.
EDIT:
As observed by @yixuan, positive semi-definite matrices are not in general strictly invertible. I checked that in my case I just got positive definite matrices, so I accepted an answer that works for positive definiteness. Anyway, in the LAPACK low-level routines I found DSY*
routines that are optimised for just simmetric/hermitian matrices, although it seems they are missing in scipy
(maybe it is just a matter of installed versions).
I tried out @percusse's answer, but when I timed its execution, I found that it was about 33% slower than np.linalg.inv
(using a sample of 100K random positive definite 4x4 np.float64
matrices). Here is my implementation:
from scipy.linalg import lapack
def upper_triangular_to_symmetric(ut):
ut += np.triu(ut, k=1).T
def fast_positive_definite_inverse(m):
cholesky, info = lapack.dpotrf(m)
if info != 0:
raise ValueError('dpotrf failed on input {}'.format(m))
inv, info = lapack.dpotri(cholesky)
if info != 0:
raise ValueError('dpotri failed on input {}'.format(cholesky))
upper_triangular_to_symmetric(inv)
return inv
I tried profiling it, and to my surprise, it spends about 82% of its time calling upper_triangular_to_symmetric
(which is not the "hard" part)! I think this happens because it's doing floating point addition in order to combine the matrices, instead of a simple copy.
I tried a upper_triangular_to_symmetric
implementation that is about 87% faster (see this question and answer):
from scipy.linalg import lapack
inds_cache = {}
def upper_triangular_to_symmetric(ut):
n = ut.shape[0]
try:
inds = inds_cache[n]
except KeyError:
inds = np.tri(n, k=-1, dtype=np.bool)
inds_cache[n] = inds
ut[inds] = ut.T[inds]
def fast_positive_definite_inverse(m):
cholesky, info = lapack.dpotrf(m)
if info != 0:
raise ValueError('dpotrf failed on input {}'.format(m))
inv, info = lapack.dpotri(cholesky)
if info != 0:
raise ValueError('dpotri failed on input {}'.format(cholesky))
upper_triangular_to_symmetric(inv)
return inv
This version is about 68% faster than np.linalg.inv
and only spends about 42% of its time calling upper_triangular_to_symmetric
.
The API doesn't exist yet but you can use the low level LAPACK ?POTRI
routine family for it.
The docstring of sp.linalg.lapack.dpotri
is as follows
Docstring:
inv_a,info = dpotri(c,[lower,overwrite_c])
Wrapper for ``dpotri``.
Parameters
----------
c : input rank-2 array('d') with bounds (n,n)
Other Parameters
----------------
overwrite_c : input int, optional
Default: 0
lower : input int, optional
Default: 0
Returns
-------
inv_a : rank-2 array('d') with bounds (n,n) and c storage
info : int
Call signature: sp.linalg.lapack.dpotri(*args, **kwargs)
The most important is the info
output. If it is zero that means it solved the equation succesfully regardless of positive definiteness. Because this is low-level call no other checks are performed.
>>>> M = np.random.rand(10,10)
>>>> M = M + M.T
>>>> # Make it pos def
>>>> M += (1.5*np.abs(np.min(np.linalg.eigvals(M))) + 1) * np.eye(10)
>>>> zz , _ = sp.linalg.lapack.dpotrf(M, False, False)
>>>> inv_M , info = sp.linalg.lapack.dpotri(zz)
>>>> # lapack only returns the upper or lower triangular part
>>>> inv_M = np.triu(inv_M) + np.triu(inv_M, k=1).T
Also if you compare the speed
>>>> %timeit sp.linalg.lapack.dpotrf(M)
The slowest run took 17.86 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.15 µs per loop
>>>> %timeit sp.linalg.lapack.dpotri(M)
The slowest run took 24.09 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.08 µs per loop
>>>> III = np.eye(10)
>>>> %timeit sp.linalg.solve(M,III, sym_pos=1,overwrite_b=1)
10000 loops, best of 3: 40.6 µs per loop
So you get a quite nonnegligible speed benefit. If you are working with complex numbers, then you have to use zpotri
instead.
The question is whether you need the inverse or not. You probably don't if you need to compute B⁻¹ * A
because solve(B,A)
is better for that.
To my knowledge there is not a standard matrix inverse function for symmetric matrices. In general you need more constraints on sparseness etc. to get good speed-ups for your solvers. However, if you look at scipy.linalg you'll see there are some eigenvalue routines that are optimized for Hermitian (symmetric) matrices.
For example, when I generate a random 200x200 dense matrix and solve the eigenvalues I get:
from scipy.linalg import inv,pinvh,eig,eigh
B = np.rand(200,200)
B = B+B.T
%timeit inv(B)
1000 loops, best of 3: 915 µs per loop
%timeit pinvh(B)
100 loops, best of 3: 6.93 ms per loop
So no advantage on the inverse but:
%timeit eig(B)
10 loops, best of 3: 39.1 ms per loop
%timeit eigh(B)
100 loops, best of 3: 4.9 ms per loop
a cool 8x speedup on eigenvalues.
If your matrix is sparse, you should check out scipy.sparse.linalg which has a handful of solvers, some of which (like bicg and cg) require Hermitian matrices and so may be faster. However, it's only sensible if your matrix is sparse, only solves for a particular solution vector b
and may not actually be faster, depending on the matrix structure. You'll really have to benchmark it to find out.
I asked a similar question about C++ solvers and ultimately found that it's really application dependent and you have to pick the best solver for your problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With