Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to substitute `find` commands with `logical indexing` (MATLAB), for looking up vector value positions of unique values?

In MATLAB, I have a for loop which has a lot of interations to go through and fill a sparse matrix. The program is very slow and I would like to optimize it to see it finish some time soon. In two lines I use the command find, and the editor of MATLAB, warns me that the use of logical indexing instead of find will improve performace. My code is quite similar to that presented to the mathworks newreader, mathworks newsreader recommendation, where there is a vector of values and a vector of unique value generated from it. Uses find to obtain the index in the unique values (for updating the values in a matrix). To be brief, the code given is:

     positions = find(X0_outputs == unique_outputs(j,1));
% should read
     positions = X0_outputs == unique_outputs(j,1);

But the last line is not the index, but a vector of zeros and ones. I have an illustrative example, make a set of indices; tt=round(rand(1,6)*10):

 tt = 3     7     1     7     1     7

Make a unique vector; ttUNI=unique(tt)

ttUNI = 1     3     7

Use find to get the position index of the value in the set of unique values; find(ttUNI(:) == tt(1))

ans = 2

Compare with using logical indexing; (ttUNI(:) == tt(1))

ans =
 0
 1
 0

Having the value 2 is alot more useful than that binary vector when I need to update the indices for a matrix. For my matrix, I can say mat(find(ttUNI(:) == tt(1)), 4) and that works. Whereas using (ttUNI(:) == tt(1)) needs post processing.

Is there a neat and efficient way of doing what is needed? Or is the use of find unavoidable in circumstances such as these?

UPDATE: I will include code here as recommended by user: @Jonas to give better insight into the problem which I am having and report some of the profiler tool's results.

ALL_NODES = horzcat(network(:,1)',network(:,2)');
NUM_UNIQUE = unique(ALL_NODES);%unique and sorted    
UNIQUE_LENGTH = length(NUM_UNIQUE);
TIME_MAX = max(network(:,3));
WEEK_NUM = floor((((TIME_MAX/60)/60)/24)/7);%divide seconds for minutes, for hours, for days and how many weeks
%initialize tensor of temporal networks
temp = length(NUM_UNIQUE);
%making the tensor a sparse 2D tensor!!! So each week is another replica of
%the matrix below
Atensor = sparse(length(NUM_UNIQUE)*WEEK_NUM,length(NUM_UNIQUE));
WEEK_SECONDS = 60*60*24*7;%number of seconds in a week

for ii=1:size(network,1)%go through all rows/observations 
    WEEK_NOW = floor(network(ii,3)/WEEK_SECONDS) + 1;
    if(WEEK_NOW > WEEK_NUM)
        disp('end of weeks')
        break
    end
    data_node_i = network(ii,1);
    Atensor_row_num = find(NUM_UNIQUE(:) == data_node_i)...
        + (WEEK_NOW-1)*UNIQUE_LENGTH;
    data_node_j = network(ii,2);
    Atensor_col_num = find(NUM_UNIQUE(:) == data_node_j);
    %Atensor is sparse
    Atensor(Atensor_row_num,Atensor_col_num) = 1;          
end

Here UNIQUE_LENGTH = 223482 and size(network,1)=273209. I rand the profiler tool for a few minutes, which was not enough time needed for the program to finish, but to reach a steady state when the ratio of times would not change too much. Atensor_row_num = find(NUM_UNI.. is 45.6% and Atensor_col_num = find(NUM_UNI... is 43.4%. The line with Atensor(Atensor_row_num,Atenso... which allocates values to the sparse matrix, is only 8.9%. The length of the NUM_UNIQUE vector is quite large, so find is an important aspect of the code; even more important than the sparse matrix manipulation. Any improvement here would be significant. I don't know if there is a more efficient logical progression for this algorithm to proceed as well rather than taking the straightforward approach of replacing find.

like image 601
Vass Avatar asked Feb 27 '12 11:02

Vass


2 Answers

find is indeed unavoidable under certain circumstances. For example, if you want to loop over indices, i.e.

idx = find(someCondition);
for i = idx(:)'
    doSomething
end

or if you want to do multi-level indexing

A = [1:4,NaN,6:10];
goodA = find(isfinite(A));
everyOtherGoodEntry = A(goodA(1:2:end));

or if you want the first n good values

A = A(find(isfinite(A),n,'first');

In your case, you may be able to avoid the call to find by using the additional outputs of unique

[uniqueElements,indexIntoA,indexIntoUniqueElements] = unique(A);

Before you try to optimize your code by fixing what you think takes time, I suggest you run the profiler on your code to check what really takes time. And then you can possibly post the code of your actual loop, and we may be able to help.

like image 122
Jonas Avatar answered Nov 10 '22 08:11

Jonas


If you'd like to find the index of the true values in a logical vector, you can do the following:

>> r = rand(1,5) 
r =
    0.5323    0.3401    0.4182    0.8411    0.2300

>> logical_val = r < 0.5            % Check whether values are less than 0.5
logical_val =
     0     1     1     0     1

>> temp = 1:size(r,2)               % Create a vector from 1 to the size of r
temp =
     1     2     3     4     5

>> temp(logical_val)                % Get the indexes of the true values
ans =
     2     3     5
like image 20
Alex L Avatar answered Nov 10 '22 08:11

Alex L