Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MATLAB adding array elements iteratively: time behavior

So I know this isn't the recommended technique (preallocating is better), but I got really curious about this timing behavior; I'm curious what might be happening under the hood.

In my head, adding an element to an array could induce a couple different reasonable behaviors in memory depending on implementation: (1) amortized, it would take the same amount of time to add an element like in a linked list where you maintain a pointer to the last element, (2) it could take a big chunk of time now and then to preallocate enough memory for, say, twice as many elements as currently in the list (like a Java array), (3) something more clever than I can think of.

MATLAB seems to do something wonky that I don't quite grock. There seems to be a linear increase in cost with occasional spikes. Any guesses (or intelligent explanations) of what it might be doing? I averaged over simulations (which, I submit, might hide some interesting patterns).

This is what happens when you add one element to the end of an initially empty list iteratively. Why the linear increase? Is there a cool reason for those seemingly periodic spikes?

iterative time behavior

The code I used to generate it:

% for averaging over
num_averages = 100000;
% number of simulations
num_sims = 10000;
% the time it takes to add one more item, array
time_store = nan(num_sims, num_averages);

% averaging count
for i = 1:num_averages

    % an array that grows with every loop
    building_array = [];

    for j = 1:num_sims
        tic;
        building_array = [building_array 1];
        time_store(j, i) = toc;
    end
end

plot(mean(time_store, 2)); hold all;
xlabel('Element num'); ylabel('Time');
like image 564
Peter Barrett Bryan Avatar asked Jan 19 '18 23:01

Peter Barrett Bryan


1 Answers

This is really interesting! There are peaks at very regular intervals, and the curve is more or less flat in between. After each peak the line rises a bit. Neat! I think this is related to cache lines. You are measuring the cost of copying the array, which is presumably related to the cost of reading and writing cache lines.

If you replace the line

building_array = [building_array 1];

with

building_array(end+1) = 1;

then you won't be copying the data at every iteration loop, but actually appending an element to the array. With this change I get a mostly flat line, with peaks at logarithmically increasing distances (and of logarithmically increasing height too), which is consistent with the common-sense implementation of doubling array size when needed:

enter image description here

Just some more explanation: building_array = [building_array 1] creates a new array, one element larger than building_array, and copies building_array and 1 into it. This is then assigned to building_array. It seems that MATLAB's JIT has not yet been optimized for this code pattern!

like image 115
Cris Luengo Avatar answered Oct 13 '22 01:10

Cris Luengo