In my program, I have a large (e.g. 100x100) array of structs, each struct having a fair amount of data (e.g. 1000 numbers, and some other fields). For example:
for x = 100 : -1 : 1
for y = 100 : -1 : 1
database(y,x).data = rand(30);
database(y,x).name = sprintf('my %d %d', x, y);
end
end
I would like to do a computation of 10-20 lines of code with my data; for example:
for x = 10 : 90
for y = 10 : 90
for dx = -9 : 9
for dy = -9 : 9
result = result + database(y + dy, x + dx).data(1, 1);
result = result + 2 * database(y + dy, x + dx).data(1, 2) * database(y + dy, x + dx).data(2, 2);
... % more stuff here
end
end
end
end
My code refers to current element of the database as database(y + dy, x + dx)
. To make it shorter, I give a name to it (C++ would call it "reference"):
temp = database(y + dy, x + dx);
result = result + temp.data(1, 1);
result = result + 2 * temp.data(1, 2) * temp.data(2, 2);
This makes my code much shorter and clearer. However, this is also much slower, and profiling shows that the assignment temp = ...
takes 70% of my execution time.
So my assumption is that Matlab copies the contents of the largish database element, eating my time. I think Matlab should be smart enough to do "copy-on-write", that is, copy the stuff only when it is changed later. However, this is not what happens in my case - my code only reads from the database, and doesn't change it.
So, how can I make an efficient read-only reference to a struct?
Passing by reference uses a pointer to access the structure arguments. If the function writes to an element of the input structure, it overwrites the input value. Passing by value makes a copy of the input or output structure argument. To reduce memory usage and execution time, use pass by reference.
value = getfield( S , field ) returns the value in the specified field of the structure S . For example, if S.a = 1 , then getfield(S,'a') returns 1 . As an alternative to getfield , use dot notation, value = S. field .
To index into a structure array, use array indexing. For example, patient(2) returns the second structure.
You can pass a MATLAB structure to the function and let MATLAB autoconvert the argument. Or you can pass a pointer to a structure, which avoids creating a copy of the structure.
Well, there is definately copying going on when you do:
temp = database(y + dy, x + dx)
This could be reduced perhaps by using:
temp = database(y + dy, x + dx).data
But obviously that would only work if you were just interested in the data in this part of the code.
That being said, I am not sure whether you can work around it without using inconvenient methods to structure your data. First of all you could benchmark your code after doing a replace all of temp
by database(y + dy, x + dx)
to assure that avoiding the copy will really help. If so you could try feeding database(y + dy, x + dx)
to a subfunction, as typically variables in a subfunction are used with read acces if that is sufficient. However, I am not sure whether this also applies to parts of variables.
If none of the above helps, consider some of the oldest advice in the book:
For efficient calculations on big chunks of data, consider using matrices.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With