Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to compute diag(X %*% solve(A) %*% t(X)) efficiently without taking matrix inverse?

I need the following diagonal:

diag(X %*% solve(A) %*% t(X))

where A is a full rank square matrix, X is a rectangular matrix. Both A and X are sparse.

I know finding the inverse of a matrix is bad unless you really need it. However, I can't see how to rewrite the formula such that solve(A) is replaced by solve with two arguments, such that a linear system is solved without explicitly inverting. Is that possible?

like image 904
Coolwater Avatar asked Sep 16 '16 14:09

Coolwater


People also ask

What is the easiest way to find the inverse of a matrix?

We can find the matrix inverse only for square matrices, whose number of rows and columns are equal such as 2 × 2, 3 × 3, etc. In simple words, inverse matrix is obtained by dividing the adjugate of the given matrix by the determinant of the given matrix.

Why you should never invert a matrix?

inverting takes more time than solving; the error in x when solving Ax=b directly is a little smaller than when inverting; and. the residuals in the estimate of b when solving directly are many orders of magnitude smaller than when inverting.

Why is LU decomposition better than inverse?

Normally you don't need/want to obtain the inverse of an matrix, because it is costly and many times unnecessary. The LU decomposition can be used with its necessary to solve a problem with many right hand sides. This situation often arise when solving PDE's using the finite differences method.


1 Answers

Perhaps before I really start, I need to mention that if you do

diag(X %*% solve(A, t(X)))

matrix inverse is avoided. solve(A, B) performs LU factorization and uses the resulting triangular matrix factors to solve linear system A x = B. If you leave B unspecified, it is default to a diagonal matrix hence you will be explicitly computing matrix inverse of A.

You should read ?solve carefully, and possibly many times for hints. It says it is based on LAPACK routine DGESV, where you can find the numerical linear algebra behind the scene.

DGESV computes the solution to a real system of linear equations

   A * X = B,

where A is an N-by-N matrix and X and B are N-by-N RHS matrices.

The LU decomposition with partial pivoting and row interchanges is
used to factor A as

  A = P * L * U,

where P is a permutation matrix, L is unit lower triangular, and U is
upper triangular.  The factored form of A is then used to solve the
system of equations A * X = B.

The difference between solve(A, t(X)) and solve(A) %*% t(X) is a matter of efficiency. The general matrix multiplication %*% in the latter is much more expensive than solve itself.

Yet, even if you use solve(A, t(X)), you are not on the fastest track, as you have another %*%.

Also, as you only want diagonal elements, you waste considerable time in first getting the full matrix. My answer below will get you on the fastest track.

I have rewritten everything in LaTeX, and the content is greatly expanded, too, including reference to R implementation. Extra resources are given on QR factorization, Singular Value decomposition and Pivoted Cholesky factorization in the end, in case you find them useful.


overview

chol

lu


Extra resources

In case you are interested in pivoted Cholesky factorization, you can refer to Compute projection / hat matrix via QR factorization, SVD (and Cholesky factorization?). There I also discuss QR factorization and singular value decomposition.

The above link is set in ordinary least square regression context. For weighted least square, you can refer to Get hat matrix from QR decomposition for weighted least square regression.

QR factorization also takes compact form. Should you want to know more on how QR factorization is done and stored, you may refer to What is "qraux" returned by QR decomposition.

These questions and answers all focus on numerical matrix computations. The following gives some statistical application:

  • linear model with lm(): how to get prediction variance of sum of predicted values
  • Sampling from multivariate normal distribution with rank-deficient covariance matrix via Pivoted Cholesky Factorization
  • Boosting lm() by a factor of 3, using Cholesky method rather than QR method
like image 157
Zheyuan Li Avatar answered Oct 03 '22 01:10

Zheyuan Li