I have noticed many individual questions on SO but no one good guide to MATLAB optimization.
Common Questions:
I don't think that these questions will stop, but I'm hoping that the ideas presented here will them something centralized to refer to.
Optimizing Matlab code is kind of a black-art, there is always a better way to do it. And sometimes it is straight-up impossible to vectorize your code.
So my question is: when vectorization is impossible or extremely complicated, what are some of your tips and tricks to optimize MATLAB code? Also if you have any common vectorization tricks I wouldn't mind seeing them either.
All of these tests are performed on a machine that is shared with others, so it is not a perfectly clean environment. Between each test I clear the workspace to free up memory.
Please don't pay attention to the individual numbers, just look at the differences between the before and after optimisation times.
Note: The tic
and toc
calls I have placed in the code are to show where I am measuring the time taken.
The simple act of pre-allocating arrays in Matlab can give a huge speed advantage.
tic;
for i = 1:100000
my_array(i) = 5 * i;
end
toc;
This takes 47 seconds
tic;
length = 100000;
my_array = zeros(1, length);
for i = 1:length
my_array(i) = 5 * i;
end
toc;
This takes 0.1018 seconds
47 seconds to 0.1 seconds for a single line of code added is an amazing improvement. Obviously in this simple example you could vectorize it to my_array = 5 * 1:100000
(which took 0.000423 seconds) but I am trying to represent the more complicated times when vectorization isn't an option.
I recently found that the zeros function (and others of the same nature) are not as fast at pre-allocating as simply setting the last value to 0:
tic;
length = 100000;
my_array(length) = 0;
for i = 1:length
my_array(i) = 5 * i;
end
toc;
This takes 0.0991 seconds
Now obviously this tiny difference doesn't prove much but you'll have to believe me over a large file with many of these optimisations the difference becomes a lot more apparent.
Why does this work?
The pre-allocation methods allocate a chunk of memory for you to work with. This memory is contiguous and can be pre-fetched, just like an Array in C++ or Java. However if you do not pre-allocate then MATLAB will have to dynamically find more and more memory for you to use. As I understand it, this behaves differently to a Java ArrayList and is more like a LinkedList where different chunks of the array are split all over the place in memory.
Not only is this slower when you write data to it (47 seconds!) but it is also slower every time you access it from then on. In fact, if you absolutely CAN'T pre-allocate then it is still useful to copy your matrix to a new pre-allocated one before you start using it.
What if I don't know how much space to allocate?
This is a common question and there are a few different solutions:
.mat
file or similar so that it could be read quickly at a later date.How do I pre-allocate a complicated structure?
Pre-allocating space for simple data-types is easy, as we have already seen, but what if it is a very complex data type such as a struct of structs?
I could never work out to explicitly pre-allocate these (I am hoping someone can suggest a better method) so I came up with this simple hack:
tic;
length = 100000;
% Reverse the for-loop to start from the last element
for i = 1:length
complicated_structure = read_from_file(i);
end
toc;
This takes 1.5 minutes
tic;
length = 100000;
% Reverse the for-loop to start from the last element
for i = length:-1:1
complicated_structure = read_from_file(i);
end
% Flip the array back to the right way
complicated_structure = fliplr(complicated_structure);
toc;
This takes 6 seconds
This is obviously not perfect pre-allocation, and it takes a little while to flip the array afterwards, but the time improvements speak for themselves. I'm hoping someone has a better way to do this, but this is a pretty good hack in the mean time.
In terms of memory usage, an Array of Structs is orders of magnitude worse than a Struct of Arrays:
% Array of Structs
a(1).a = 1;
a(1).b = 2;
a(2).a = 3;
a(2).b = 4;
Uses 624 Bytes
% Struct of Arrays
a.a(1) = 1;
a.b(1) = 2;
a.a(2) = 3;
a.b(2) = 4;
Uses 384 Bytes
As you can see, even in this simple/small example the Array of Structs uses a lot more memory than the Struct of Arrays. Also the Struct of Arrays is in a more useful format if you want to plot the data.
Each Struct has a large header, and as you can see an array of structs repeats this header multiple times where the struct of arrays only has the one header and therefore uses less space. This difference is more obvious with larger arrays.
The less number of freads
(or any system call for that matter) you have in your code, the better.
tic;
for i = 1:100
fread(fid, 1, '*int32');
end
toc;
The previous code is a lot slower than the following:
tic;
fread(fid, 100, '*int32');
toc;
You might think that's obvious, but the same principle can be applied to more complicated cases:
tic;
for i = 1:100
val1(i) = fread(fid, 1, '*float32');
val2(i) = fread(fid, 1, '*float32');
end
toc;
This problem is no longer simple because in memory the floats are represented like this:
val1 val2 val1 val2 etc.
However you can use the skip
value of fread to achieve the same optimizations as before:
tic;
% Get the current position in the file
initial_position = ftell(fid);
% Read 100 float32 values, and skip 4 bytes after each one
val1 = fread(fid, 100, '*float32', 4);
% Set the file position back to the start (plus the size of the initial float32)
fseek(fid, position + 4, 'bof');
% Read 100 float32 values, and skip 4 bytes after each one
val2 = fread(fid, 100, '*float32', 4);
toc;
So this file read was accomplished using two fread
s instead of 200, a massive improvement.
I recently worked on some code that used many function calls, all of which were located in separate files. So lets say there were 100 separate files, all calling each other. By "inlining" this code into one function I saw a 20% improvement in execution speed from 9 seconds.
Obviously you would not do this at the expense of re-usability, but in my case the functions were automatically generated and not reused at all. But we can still learn from this and avoid excessive function calls where they are not really needed.
External MEX functions incur an overhead for being called. Therefore one call to a large MEX function is a lot more efficient than many calls to smaller MEX functions.
When plotting disconnected data such as a set of vertical lines, the traditional way to do this in Matlab is to iterate multiple calls to line
or plot
using hold on
. However if you have a large number of individual lines to plot, this becomes very slow.
The technique I have found uses the fact that you can introduce NaN
values into data to plot and it will cause a break in the data.
The below contrived example converts a set of x_values, y1_values, and y2_values (where the line is from [x, y1] to [x, y2]) to a format appropriate for a single call to plot
.
For example:
% Where x is 1:1000, draw vertical lines from 5 to 10.
x_values = 1:1000;
y1_values = ones(1, 1000) * 5;
y2_values = ones(1, 1000) * 10;
% Set x_plot_values to [1, 1, NaN, 2, 2, NaN, ...];
x_plot_values = zeros(1, length(x_values) * 3);
x_plot_values(1:3:end) = x_values;
x_plot_values(2:3:end) = x_values;
x_plot_values(3:3:end) = NaN;
% Set y_plot_values to [5, 10, NaN, 5, 10, NaN, ...];
y_plot_values = zeros(1, length(x_values) * 3);
y_plot_values(1:3:end) = y1_values;
y_plot_values(2:3:end) = y2_values;
y_plot_values(3:3:end) = NaN;
figure; plot(x_plot_values, y_plot_values);
I have used this method to print thousands of tiny lines and the performance improvements were immense. Not only in the initial plot, but the performance of subsequent manipulations such as zoom or pan operations improved as well.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With