I'm trying to produce vectorized code for this equation (and multiple others of the same form):
This will be evaluated in ode45
and the only part that changes is Rmn(t)
so I am pre-computing the Bessel and sin
functions. At the moment my code looks like this:
for iR = 1:nR
p(:, iR) = sum(Rmn.*J0m(:, :, iR),1)*sinanz;
end
M
, N
are the number of terms in my summation and R
, Z
are the number of r
and z
coordinates I'm using. Rmn
is an M*N
matrix. J0m
is a M*N*R
array. It is an M*R
matrix that is repeated N
times. sinanz
is an N*Z
matrix. J0m
and sinanz
are pre-computed and don't change.
This works but is slow and so I am trying to optimize it. I figured the first step would be to reduce J0m
so it is only m*R
but I can't figure out how. I am looking for any suggestions on how to do this and any suggestions on how to optimize this in general.
S = sum( A ) returns the sum of the elements of A along the first array dimension whose size does not equal 1. If A is a vector, then sum(A) returns the sum of the elements. If A is a matrix, then sum(A) returns a row vector containing the sum of each column.
The "|" operator is an element-wise operator, intended to be used on arrays element-by-element. The "||" operator is a short-circuiting operator restricted to be used on scalars only. See the doc: https://www.mathworks.com/help/matlab/logical-operations.html.
C = A + B adds arrays A and B by adding corresponding elements. If one input is a string array, then plus appends the corresponding elements as strings. The sizes of A and B must be the same or be compatible. If the sizes of A and B are compatible, then the two arrays implicitly expand to match each other.
Double sum is nothing more than sum of a sum. If the elements of the sum have two indices and you want to add the index one by one, then you use double sums. In calculating double summations, here are the steps.
As you probably already know, you should pre-allocate p
before the loop:
p = zeros(Z, nR);
this prevents the array p
from growing at each iteration, speeding up the loop tremendously.
You could vectorize the whole thing by virtue of bsxfun
:
% C ≣ M×N×R array of all products Rmn·J0m
C = bsxfun(@times, Rmn, J0m);
% D ≣ M×N×R×Z array of all products C·sinanz
D = bsxfun(@times, C, permute(sinanz, [3 1 4 2]));
% Sum in M and N directions, and re-order
p = permute(sum(sum(D,1),2), [4 3 1 2]);
but I doubt it's going to be faster; MATLAB (read: BLAS) is pretty fast with 2D matrices, but usually not terribly good with more-D arrays.
I suggest you read about bsxfun
; it's also the way to reduce the J0m
array to M
×R
in the way you describe.
Of course, you can get rid of the permute
s by properly defining your variables, so let's do a small test in the 'ideal' versions of both the looped and vectorized code:
%% Initialize some variables
% Number of tests
TESTS = 1e4;
% Your dimensions
M = 5; nR = 4;
N = 2; Z = 3;
% Some dummy data
Rmn = rand(M,N);
sinanz = rand(N,Z);
J0m = rand(M,nR); % NOTE: J0m doesn't have to be 3D anymore by using bsxfun
%% Test 1: your own version, optimized
% Start test
start = tic;
% Repeat the test a couple of times to get a good average
for ii = 1:TESTS
p1 = zeros(Z,nR);
for iR = 1:nR
p1(:, iR) = sum( bsxfun(@times,Rmn,J0m(:, iR)), 1 )*sinanz;
end
end
stop = toc(start);
% Average execution time
fprintf(1, 'Average time of looped version: %f seconds.\n', stop/TESTS);
%% Vectorized version, using 4D arrays:
% Don't make the permutes part of the test
J0m = permute(J0m, [1 3 2]);
sinanz = permute(sinanz, [3 1 4 2]);
% Start test
start = tic;
% Repeat the test a couple of times to get a good average
for ii = 1:TESTS
C = bsxfun(@times, Rmn, J0m);
D = bsxfun(@times, C, sinanz);
p2 = sum(sum(D,1),2);
end
stop = toc(start);
% Average execution time
fprintf(1, 'Average time of vectorized version: %f seconds.\n', stop/TESTS);
%% Check for equality
maxDifference = max(max(p1 - squeeze(p2)'))
Results:
Average time of looped version : 0.000054 seconds.
Average time of vectorized version: 0.000023 seconds.
maxDifference =
4.440892098500626e-016
which seems pretty good! However, using
M = 50; nR = 40;
N = 20; Z = 30;
you get
Average time of looped version : 0.000708 seconds.
Average time of vectorized version: 0.009835 seconds.
maxDifference =
8.526512829121202e-014
so the vectorized version just got an order of magnitude slower than the looped variant.
Of course, your mileage may vary, but the take home message is that you should expect this difference to get worse and worse for increasing dimensions.
So, in conclusion:
p
variablebsxfun
to reduce the memory footprint (but it won't increase the speed much)If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With