Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to solve least square for overdetermined system

I have a matrix A of size m*n( m order of ~100K and n ~500) and a vector b. Also, my matrix is ill-conditioned and rank-deficient. Now I want to find out the least-square solution to Ax = b and to this end I have compared some of the methods:

  • scipy.linalg.lstsq (time/residual) : 14s, 626.982
  • scipy.sparse.linalg.lsmr (time/residual) : 4.5s, 626.982 (same accuracy)

Now I have observed that when I don't have the rank-deficient case forming the normal equation and solving it using cholesky factorization is fastest way to solve my problem. So my question is this If I am not interested in the minimum norm solution then is there a way to get a solution(any) to (A^TAx=b) when A^TA is singular. I have tried scipy.linalg.solve but it gives LinAlgError for singular matrices. Also I would like to know if A is such that m>>n, ill-conditonied, possibly not full col-rank then which method should one use in terms of time, residual accuracy(or any other metric). Any thoughts and help is greatly appreciated. Thanks!

like image 274
user1131274 Avatar asked Aug 06 '17 14:08

user1131274


People also ask

Can you solve an overdetermined system of equations?

In an over-determined system you have more data than needed to determine a unique solution. There may not be an exact solution for a system that is over-determined. However, we can define a solution for a set of equations when there are more equations than there are unknowns using a Least Squares Method.

Can an overdetermined system have infinite solutions?

A balanced or overdetermined system may also have infinitely many or no solutions: However, a balanced system or an overdetermined system may have a unique solution (whereas an underdetermined system may not): Let's look at some examples.

Can an underdetermined system have a unique solution?

The phenomenon that an underdetermined system admits a unique “nonnegative” solution is not restricted for the vector case. Finding the minimum rank matrix among all matrices satisfying given linear equations is a rank minimization problem.

Can an overdetermined system be linearly independent?

A set of vectors which are linearly independent always correspond to a system which is overdetermined or which has the same number of variables as equations with no free variables like the vectors in Investigation 1.1. 2 or Example 1.1.


1 Answers

I'd say the "correct" way of going about this is to use the SVD, look at your singular value spectrum, and figure out how many singular values you want to keep, i.e., figure out how close you want A^T x to be to b. Something along these lines:

def svd_solve(a, b):
    [U, s, Vt] = la.svd(a, full_matrices=False)
    r = max(np.where(s >= 1e-12)[0])
    temp = np.dot(U[:, :r].T, b) / s[:r]
    return np.dot(Vt[:r, :].T, temp)

However, for a matrix of size (100000, 500), this is just going to be way too slow. I would recommend implementing least squares by yourself, and adding a small amount of regularization to avoid the issue of the matrix becoming singular.

def naive_solve(a, b, lamda):
    return la.solve(np.dot(a.T, a) + lamda * np.identity(a.shape[1]),
                    np.dot(a.T, b))

def pos_solve(a, b, lamda):
    return la.solve(np.dot(a.T, a) + lamda * np.identity(a.shape[1]),
                    np.dot(a.T, b), assume_a='pos')

Here's a timing analysis on my workstation*:

>>> %timeit la.lstsq(a, b)
1.84 s ± 39.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit naive_solve(a, b, 1e-25)
140 ms ± 4.15 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> %timeit pos_solve(a, b, 1e-25)
135 ms ± 768 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

*I somehow don't seem to have scipy.sparse.linalg.lsmr on my machine, so I couldn't compare against that.

It doesn't seem to do much here, but I've seen elsewhere that adding the assume_a='pos' flag can actually give you a lot of benefit. You can certainly do this here, since A^T A is guaranteed to be positive semi-definite, and the lamda makes it positive definite. You might have to play with lamda a little to bring your error sufficiently low.

And in terms of error:

>>> xhat_lstsq = la.lstsq(a, b)[0]
>>> la.norm(np.dot(a, xhat_lstsq) - b)
1.4628232073579952e-13
>>> xhat_naive = naive_solve(a, b, 1e-25)
>>> la.norm(np.dot(a, xhat_naive) - b)
7.474566255470176e-13
>>> xhat_pos = pos_solve(a, b, 1e-25)
>>> la.norm(np.dot(a, xhat_pos) - b)
7.476075564322223e-13

PS: I generated an a and a b of my own like this:

s = np.logspace(1, -20, 500)
u = np.random.randn(100000, 500)
u /= la.norm(u, axis=0)[np.newaxis, :]
a = np.dot(u, np.diag(s))
x = np.random.randn(500)
b = np.dot(a, x)

My a isn't completely singular, but near-singular.

Response to comment

I guess what you are trying to do is to find a feasible point under some linear equality constraints. The trouble here is that you don't know which constraints are important. Each of the 100,000 rows of A gives you a new constraint, out of which at most 500, but possibly far fewer (because of underdetermined-ness), actually matter. The SVD gives you a way of figuring out which dimensions are important. I don't know of another way to do this: you might find something in convex optimization or linear programming literature. If you know a priori that the rank of A is r, then you can try to find only the first r singular values and corresponding vectors, which might save time if r << n.

Regarding your other question, the minimum norm solution isn't the "best" or even the "correct" solution. Since your system is underdetermined, you need to throw in some additional constraints or assumptions which will help you find a unique solution. The minimum norm constraint is one such. The minimum norm solution is often considered to be "good", because if x is some physical signal which you are trying to design, then an x with lower norm often corresponds to a physical signal with lower energy, which then translates to cost savings, etc. Alternatively, if x is a parameter of some system you are trying to estimate, then choosing the minimum norm solution means you are assuming that the system is efficient in some way, and uses only the minimum energy needed to produce the outcome b. Hope that all makes sense.

like image 154
Praveen Avatar answered Sep 18 '22 07:09

Praveen