Sorry if this is obvious but I searched a while and did not find anything (or missed it).
I'm trying to solve linear systems of the form Ax=B with A a 4x4 matrix, and B a 4x1 vector.
I know that for a single system I can use mldivide
to obtain x: x=A\B
.
However I am trying to solve a great number of systems (possibly > 10000) and I am reluctant to use a for loop because I was told it is notably slower than matrix formulation in many MATLAB problems.
My question is then: is there a way to solve Ax=B using vectorization with A 4x4x N and B a matrix 4x N ?
PS: I do not know if it is important but the B vector is the same for all the systems.
You should use a for loop. There might be a benefit in precomputing a factorization and reusing it, if A
stays the same and B
changes. But for your problem where A
changes and B
stays the same, there's no alternative to solving N linear systems.
You shouldn't worry too much about the performance cost of loops either: the MATLAB JIT compiler means that loops can often be just as fast on recent versions of MATLAB.
I don't think you can optimize this further. As explained by @Tom, since A
is the one changing, there is no benefit in factoring the various A
's beforehand...
Besides the looped solution is pretty fast given the dimensions you mention:
A = rand(4,4,10000);
B = rand(4,1); %# same for all linear systems
tic
X = zeros(4,size(A,3));
for i=1:size(A,3)
X(:,i) = A(:,:,i)\B;
end
toc
Elapsed time is 0.168101 seconds.
Here's the problem: you're trying to perform a 2D operation (mldivide) on a 3d matrix. No matter how you look at it, you need reference the matrix by index which is where the time penalty kicks in... it's not the for loop which is the problem, but it's how people use them. If you can structure your problem differently, then perhaps you can find a better option, but right now you have a few options:
1 - mex
2 - parallel processing (write a parfor loop)
3 - CUDA
Here's a rather esoteric solution that takes advantage of MATLAB's peculiar optimizations. Construct an enormous 4k x 4k sparse matrix with your 4x4 blocks down the diagonal. Then solve all simultaneously.
On my machine this gets the same solution up to single precision accuracy as @Amro/Tom's for-loop solution, but faster.
n = size(A,1);
k = size(A,3);
AS = reshape(permute(A,[1 3 2]),n*k,n);
S = sparse( ...
repmat(1:n*k,n,1)', ...
bsxfun(@plus,reshape(repmat(1:n:n*k,n,1),[],1),0:n-1), ...
AS, ...
n*k,n*k);
X = reshape(S\repmat(B,k,1),n,k);
for a random example:
For k = 10000
For loop: 0.122570 seconds.
Giant sparse system: 0.032287 seconds.
If you know that your 4x4 matrices are positive definite then you can use chol
on S to improve the accuracy.
This is silly. But so is how slow matlab's for loops still are in 2015, even with JIT. This solution seems to find a sweet spot when k is not too large so everything still fits into memory.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With