MATLAB's power function to calculate element-wise exponential for a constant base and an array of exponents becomes noticeably faster when the size of the array becomes 512. I expected to see the computation time increase with the input size, however, there is a noticeable drop when there are 512 elements in the array of exponents. Here is a sample code
x_list = 510:514;
for i = 1:numel(x_list)
x = x_list(i);
tic
for j = 1:10000
y = power(2,1:x);
end
toc
end
The output of the code is
Elapsed time is 0.397649 seconds.
Elapsed time is 0.403687 seconds.
Elapsed time is 0.318293 seconds.
Elapsed time is 0.238875 seconds.
Elapsed time is 0.175525 seconds.
What is happening here?
I see the same effect using random numbers for the exponent, as I see using integers in the range 1:n
:
x = 500:540;
t = zeros(size(x));
for ii = 1:numel(x)
%m = 1:x(ii);
m = 500*rand(1,x(ii));
t(ii) = timeit(@()power(2,m));
end
plot(x,t)
When forcing MATLAB to use a single thread with maxNumCompThreads(1)
, and running the code above again, I see this graph instead (note the y-axis, the peaks are just noise):
It looks to me that MATLAB uses a single core to compute the exponent of 511 values, and fires up all cores if the matrix is larger. There is an overhead in using multithreading, it is not worth while to do so for small arrays. The exact point where the overhead is balanced by the time savings depends on many factors, and so hard-coding a fixed threshold for when to switch to multithreaded computation leads to a jump in execution time on systems with different characteristics to those of the system where the threshold was determined.
Note that @norok2 is not seeing this same jump because on their system MATLAB was limited to a single thread.
This is related to size of the number for which the power is computed rather than the size of the container.
If you use random numbers, for varying container size, one does not observe a jump in the timings:
x = 450:1550;
y = zeros(numel(x), 1);
X = rand(1, 10000);
for i = 1:length(x)
f = @() 2 .^ X(1:x(i));
y(i) = timeit(f);
end
figure()
plot(x, y)
Therefore the issue must be with the computation for very large numbers.
I first I thought that this might be related to overflow, but the overflow happens at 2 ^ 1024 == inf
as dictated by the IEEE standards which MATLAB follows, and I thought that for inf
this is would have been much faster than computing a number for real.
This is supported by the following benchmark where the size of the array is kept constant:
x = 450:1550;
y = zeros(numel(x), 1);
X = rand(1, 10000);
for i = 1:length(x)
f = @() 2 .^ (ones(1, 500) * x(i));
y(i) = timeit(f);
end
figure()
plot(x, y)
Why exactly this may be relevant for your setup when 2 ^ 512
instead of 2 ^ 1024
, I do not really understand.
(Note that I used 2 .^ ...
instead of power(2, ...)
but the results are the same.)
Also, running @CrisLuengo's code in my system does not really reproduce any jump.
x = 500:540;
t = zeros(size(x));
for ii = 1:numel(x)
%m = 1:x(ii);
m = 500*rand(1,x(ii));
t(ii) = timeit(@()power(2,m));
end
plot(x,t)
all the evidence so far indicate that spike being a related to JIT latency/warm-up.
Here's some confirmation of what Cris found, using a 4-core Windows machine running MATLAB R2018a. I first tested the following code to show that the specific value of the exponent wasn't the culprit for the jump:
t = zeros(4, 1000);
for p = 1:size(t, 1)
for n = 1:size(t, 2)
t(p, n) = timeit(@() power(2, (2.^(p-1)).*ones(1, n)));
end
end
And here are the results:
For degenerate edge cases where the exponent is 1 (return the same value) or 2 (return the value times itself) the computation runs faster, as expected. However, the jump at an array size of 512 or above indicates overhead is added to these edge cases, compared to a reduction in computation time for exponents of 4 and 8 when the array size exceeds 512. Larger exponent values simply reproduce the upper curves.
I then ran two more tests: one with array size between 1 and 511, and a second with array size between 512 and 1024. Here's what the processor load looked like:
Processor 3 shows a large load spike during the first test, while all 4 processors show load spikes during the second test. This confirms that multithreading is employed for array sizes of 512 or above. This also explains the slower computation for edge cases of larger sizes, since the overhead from multithreading outweighs the speedup provided by splitting up the simpler calculations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With