The Woodbury matrix identity states that the inverse of a rank-k correction of some matrix can be computed by doing a rank-k correction to the inverse of the original matrix.
If A
is a p × p
full rank matrix that is rank corrected by UCV
where U
is p × k
, C
is k × k
, and V
is k × p
, then the Woodbury identity is:
(A + UCV)^{-1} = A^{-1} - A^{-1} U (C^{-1} + V A^{-1} U)^{-1} V A^{-1}
The key is that rather than inverting a p × p
matrix, you invert a k × k
matrix. In many applications, we can assume k < p
. Inverting A
can be fast in some cases, for example if A
is a diagonal matrix.
I've implemented this here, assuming A
is diagonal and that C
is the identity:
def woodbury(A, U, V, k):
A_inv = np.diag(1./np.diag(A)) # Fast matrix inversion of a diagonal.
B_inv = np.linalg.inv(np.eye(k) + V @ A_inv @ U)
return A_inv - (A_inv @ U @ B_inv @ V @ A_inv)
and sanity checked my implementation by verifying that
n = 100000
p = 1000
k = 100
A = np.diag(np.random.randn(p))
U = np.random.randn(p, k)
V = U.T
M = U @ V + A
M_inv = woodbury(A, U, V, k)
assert np.allclose(M @ M_inv, np.eye(p))
But when I actually compare this to numpy.linalg.inv
, my Woodbury function is not nearly as fast as I would have expected. I would expect the time to invert to grow cubically with th dimensionality p
. But my results are:
My question is: why is the Woodbury method so slow? Is it simply because I'm comparing Python code to LAPACK or is there something else going on?
EDIT: My experiments with einsum() vs. broadcasting
I implemented three versions: (1) using einsum
and einsum_path
, (2) using broadcasting per the accepted answer, and (3) using both. Here is my implementation using einsum
, optimized using einsum_path
:
def woodbury_einsum(A, U, V, k):
A_inv = np.diag(1./np.diag(A))
tmp = np.einsum('ab,bc,cd->ad',
V, A_inv, U,
optimize=['einsum_path', (1, 2), (0, 1)])
B_inv = np.linalg.inv(np.eye(k) + tmp)
tmp = np.einsum('ab,bc,cd,de,ef->af',
A_inv, U, B_inv, V, A_inv,
optimize=['einsum_path', (0, 1), (0, 1), (0, 2), (0, 1)])
return A_inv - tmp
And results are here:
So avoiding the computational cost of matrix multiplication with a diagonal matrix is faster than optimizing the ordering of matrix multiplication and memory footprint using einsum()
.
As you mention, inverting A + UCV
can be made faster by using the Woodbury technique in the case when A
is diagonal. That is, in the Woodbury formula your multiplications by A^{-1}
should happen in O(p x m)
time instead of O(p x m x p)
since all you're doing is scaling the rows/columns of the right/left term.
However, that's not what you're doing in the following code!
def woodbury(A, U, V, k):
A_inv = np.diag(1./np.diag(A))
B_inv = np.linalg.inv(np.eye(k) + V @ A_inv @ U)
return A_inv - (A_inv @ U @ B_inv @ V @ A_inv)
Your A_inv
is a full p x p
matrix! Yes, the diagonal is the only part that contains non-zeros but the arithmetic with all of the zero elements will still be carried out in this dense matrix! Instead, you should use Numpys broadcasting abilities to avoid this unnecessary work. (Or, as sparse diagonal matrix using Scipy's sparse
module.)
For example,
def efficient_woodbury(A, U, V, k):
A_inv_diag = 1./np.diag(A) # note! A_inv_diag is a vector!
B_inv = np.linalg.inv(np.eye(k) + (V * A_inv_diag) @ U)
return np.diag(A_inv_diag) - (A_inv_diag.reshape(-1,1) * U @ B_inv @ V * A_inv_diag)
The product V * A_inv_diag
is equivalent to your V @ A_inv
but runs in only O(p x k)
time as opposed to O(p x k x p)
. Similarly for the other replaced products.
I re-ran your timings on my (slightly faster) machine and produced the following plot:
Much clearer performance distinction!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With