Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Woodbury identity for fast matrix inversion—slower than expected

The Woodbury matrix identity states that the inverse of a rank-k correction of some matrix can be computed by doing a rank-k correction to the inverse of the original matrix.

If A is a p × p full rank matrix that is rank corrected by UCV where U is p × k, C is k × k, and V is k × p, then the Woodbury identity is:

(A + UCV)^{-1} = A^{-1} - A^{-1} U (C^{-1} + V A^{-1} U)^{-1} V A^{-1}

The key is that rather than inverting a p × p matrix, you invert a k × k matrix. In many applications, we can assume k < p. Inverting A can be fast in some cases, for example if A is a diagonal matrix.

I've implemented this here, assuming A is diagonal and that C is the identity:

def woodbury(A, U, V, k):
    A_inv = np.diag(1./np.diag(A))  # Fast matrix inversion of a diagonal.
    B_inv = np.linalg.inv(np.eye(k) + V @ A_inv @ U)
    return A_inv - (A_inv @ U @ B_inv @ V @ A_inv)

and sanity checked my implementation by verifying that

n = 100000
p = 1000
k = 100
A = np.diag(np.random.randn(p))
U = np.random.randn(p, k)
V = U.T
M = U @ V + A
M_inv = woodbury(A, U, V, k)
assert np.allclose(M @ M_inv, np.eye(p))

But when I actually compare this to numpy.linalg.inv, my Woodbury function is not nearly as fast as I would have expected. I would expect the time to invert to grow cubically with th dimensionality p. But my results are:

enter image description here

My question is: why is the Woodbury method so slow? Is it simply because I'm comparing Python code to LAPACK or is there something else going on?

EDIT: My experiments with einsum() vs. broadcasting

I implemented three versions: (1) using einsum and einsum_path, (2) using broadcasting per the accepted answer, and (3) using both. Here is my implementation using einsum, optimized using einsum_path:

def woodbury_einsum(A, U, V, k):
    A_inv = np.diag(1./np.diag(A))
    tmp   = np.einsum('ab,bc,cd->ad',
                      V, A_inv, U,
                      optimize=['einsum_path', (1, 2), (0, 1)])
    B_inv = np.linalg.inv(np.eye(k) + tmp)
    tmp   = np.einsum('ab,bc,cd,de,ef->af',
                      A_inv, U, B_inv, V, A_inv,
                      optimize=['einsum_path', (0, 1), (0, 1), (0, 2), (0, 1)])
    return A_inv - tmp

And results are here:

enter image description here

So avoiding the computational cost of matrix multiplication with a diagonal matrix is faster than optimizing the ordering of matrix multiplication and memory footprint using einsum().

like image 936
jds Avatar asked Nov 30 '18 20:11

jds


1 Answers

As you mention, inverting A + UCV can be made faster by using the Woodbury technique in the case when A is diagonal. That is, in the Woodbury formula your multiplications by A^{-1} should happen in O(p x m) time instead of O(p x m x p) since all you're doing is scaling the rows/columns of the right/left term.

However, that's not what you're doing in the following code!

def woodbury(A, U, V, k):
    A_inv = np.diag(1./np.diag(A))
    B_inv = np.linalg.inv(np.eye(k) + V @ A_inv @ U)
    return A_inv - (A_inv @ U @ B_inv @ V @ A_inv)

Your A_inv is a full p x p matrix! Yes, the diagonal is the only part that contains non-zeros but the arithmetic with all of the zero elements will still be carried out in this dense matrix! Instead, you should use Numpys broadcasting abilities to avoid this unnecessary work. (Or, as sparse diagonal matrix using Scipy's sparse module.)

For example,

def efficient_woodbury(A, U, V, k):
    A_inv_diag = 1./np.diag(A)  # note! A_inv_diag is a vector!
    B_inv = np.linalg.inv(np.eye(k) + (V * A_inv_diag) @ U)
    return np.diag(A_inv_diag) - (A_inv_diag.reshape(-1,1) * U @ B_inv @ V * A_inv_diag)

The product V * A_inv_diag is equivalent to your V @ A_inv but runs in only O(p x k) time as opposed to O(p x k x p). Similarly for the other replaced products.

I re-ran your timings on my (slightly faster) machine and produced the following plot:

Woodbury timings.

Much clearer performance distinction!

like image 165
Chris Swierczewski Avatar answered Nov 16 '22 00:11

Chris Swierczewski