In MATLAB, I have a for loop
which has a lot of interations to go through and fill a sparse
matrix. The program is very slow and I would like to optimize it to see it finish some time soon. In two lines I use the command find
, and the editor of MATLAB, warns me that the use of logical indexing
instead of find
will improve performace. My code is quite similar to that presented to the mathworks newreader, mathworks newsreader recommendation, where there is a vector of values and a vector of unique value generated from it. Uses find
to obtain the index in the unique values (for updating the values in a matrix). To be brief, the code given is:
positions = find(X0_outputs == unique_outputs(j,1));
% should read
positions = X0_outputs == unique_outputs(j,1);
But the last line is not the index, but a vector of zeros and ones.
I have an illustrative example, make a set of indices; tt=round(rand(1,6)*10)
:
tt = 3 7 1 7 1 7
Make a unique vector; ttUNI=unique(tt)
ttUNI = 1 3 7
Use find to get the position index of the value in the set of unique values; find(ttUNI(:) == tt(1))
ans = 2
Compare with using logical indexing; (ttUNI(:) == tt(1))
ans =
0
1
0
Having the value 2
is alot more useful than that binary vector when I need to update the indices for a matrix. For my matrix, I can say mat(find(ttUNI(:) == tt(1)), 4)
and that works. Whereas using (ttUNI(:) == tt(1))
needs post processing.
Is there a neat and efficient way of doing what is needed? Or is the use of find
unavoidable in circumstances such as these?
UPDATE: I will include code here as recommended by user: @Jonas to give better insight into the problem which I am having and report some of the profiler tool's results.
ALL_NODES = horzcat(network(:,1)',network(:,2)');
NUM_UNIQUE = unique(ALL_NODES);%unique and sorted
UNIQUE_LENGTH = length(NUM_UNIQUE);
TIME_MAX = max(network(:,3));
WEEK_NUM = floor((((TIME_MAX/60)/60)/24)/7);%divide seconds for minutes, for hours, for days and how many weeks
%initialize tensor of temporal networks
temp = length(NUM_UNIQUE);
%making the tensor a sparse 2D tensor!!! So each week is another replica of
%the matrix below
Atensor = sparse(length(NUM_UNIQUE)*WEEK_NUM,length(NUM_UNIQUE));
WEEK_SECONDS = 60*60*24*7;%number of seconds in a week
for ii=1:size(network,1)%go through all rows/observations
WEEK_NOW = floor(network(ii,3)/WEEK_SECONDS) + 1;
if(WEEK_NOW > WEEK_NUM)
disp('end of weeks')
break
end
data_node_i = network(ii,1);
Atensor_row_num = find(NUM_UNIQUE(:) == data_node_i)...
+ (WEEK_NOW-1)*UNIQUE_LENGTH;
data_node_j = network(ii,2);
Atensor_col_num = find(NUM_UNIQUE(:) == data_node_j);
%Atensor is sparse
Atensor(Atensor_row_num,Atensor_col_num) = 1;
end
Here UNIQUE_LENGTH = 223482
and size(network,1)=273209
. I rand the profiler tool
for a few minutes, which was not enough time needed for the program to finish, but to reach a steady state when the ratio of times would not change too much. Atensor_row_num = find(NUM_UNI..
is 45.6% and Atensor_col_num = find(NUM_UNI...
is 43.4%. The line with Atensor(Atensor_row_num,Atenso...
which allocates values to the sparse
matrix, is only 8.9%. The length of the NUM_UNIQUE
vector is quite large, so find
is an important aspect of the code; even more important than the sparse matrix manipulation. Any improvement here would be significant. I don't know if there is a more efficient logical progression for this algorithm to proceed as well rather than taking the straightforward approach of replacing find
.
find
is indeed unavoidable under certain circumstances. For example, if you want to loop over indices, i.e.
idx = find(someCondition);
for i = idx(:)'
doSomething
end
or if you want to do multi-level indexing
A = [1:4,NaN,6:10];
goodA = find(isfinite(A));
everyOtherGoodEntry = A(goodA(1:2:end));
or if you want the first n good values
A = A(find(isfinite(A),n,'first');
In your case, you may be able to avoid the call to find
by using the additional outputs of unique
[uniqueElements,indexIntoA,indexIntoUniqueElements] = unique(A);
Before you try to optimize your code by fixing what you think takes time, I suggest you run the profiler on your code to check what really takes time. And then you can possibly post the code of your actual loop, and we may be able to help.
If you'd like to find the index of the true values in a logical vector, you can do the following:
>> r = rand(1,5)
r =
0.5323 0.3401 0.4182 0.8411 0.2300
>> logical_val = r < 0.5 % Check whether values are less than 0.5
logical_val =
0 1 1 0 1
>> temp = 1:size(r,2) % Create a vector from 1 to the size of r
temp =
1 2 3 4 5
>> temp(logical_val) % Get the indexes of the true values
ans =
2 3 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With