So I know this isn't the recommended technique (preallocating is better), but I got really curious about this timing behavior; I'm curious what might be happening under the hood.
In my head, adding an element to an array could induce a couple different reasonable behaviors in memory depending on implementation: (1) amortized, it would take the same amount of time to add an element like in a linked list where you maintain a pointer to the last element, (2) it could take a big chunk of time now and then to preallocate enough memory for, say, twice as many elements as currently in the list (like a Java array), (3) something more clever than I can think of.
MATLAB seems to do something wonky that I don't quite grock. There seems to be a linear increase in cost with occasional spikes. Any guesses (or intelligent explanations) of what it might be doing? I averaged over simulations (which, I submit, might hide some interesting patterns).
This is what happens when you add one element to the end of an initially empty list iteratively. Why the linear increase? Is there a cool reason for those seemingly periodic spikes?
The code I used to generate it:
% for averaging over
num_averages = 100000;
% number of simulations
num_sims = 10000;
% the time it takes to add one more item, array
time_store = nan(num_sims, num_averages);
% averaging count
for i = 1:num_averages
% an array that grows with every loop
building_array = [];
for j = 1:num_sims
tic;
building_array = [building_array 1];
time_store(j, i) = toc;
end
end
plot(mean(time_store, 2)); hold all;
xlabel('Element num'); ylabel('Time');
This is really interesting! There are peaks at very regular intervals, and the curve is more or less flat in between. After each peak the line rises a bit. Neat! I think this is related to cache lines. You are measuring the cost of copying the array, which is presumably related to the cost of reading and writing cache lines.
If you replace the line
building_array = [building_array 1];
with
building_array(end+1) = 1;
then you won't be copying the data at every iteration loop, but actually appending an element to the array. With this change I get a mostly flat line, with peaks at logarithmically increasing distances (and of logarithmically increasing height too), which is consistent with the common-sense implementation of doubling array size when needed:
Just some more explanation: building_array = [building_array 1]
creates a new array, one element larger than building_array
, and copies building_array
and 1
into it. This is then assigned to building_array
. It seems that MATLAB's JIT has not yet been optimized for this code pattern!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With