Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to avoid the loop to reduce the computation time of this code?

how to avoid the loop to reduce the computation time of this code (one solution of my last question):

I hope to find the column vectors of A(1:3,:) whose corresponding values in M(4,:) are not part of one of the vectors of the cell X (and obviously not equal to one of these vectors). I look for a fast solution if X is very large.

M = [1007  1007  4044  1007  4044  1007  5002 5002 5002 622 622;
      552   552   300   552   300   552   431  431  431 124 124; 
     2010  2010  1113  2010  1113  2010  1100 1100 1100  88  88;
        7    12    25    15    12    30     2   10   55  32  12];

Here I take directly A:

A = [1007  4044  5002  622;
      552   300   431  124;
     2010  1113  1100   88];

A contains unique column vectors of M(1:3,:)

X = {[2 5 68 44],[2 10 55 9 17],[1 55 6 7 8 9],[32 12]};

[~, ~, subs] = unique(M(1:3,:)','rows');

A4 = accumarray(subs(:),M(4,:).',[],@(x) {x});

%// getting a mask of which columns we want
idxC(length(A4)) = false;
for ii = 1:length(A4)
    idxC(ii) = ~any(cellfun(@(x) all(ismember(A4{ii},x)), X));
end

Displaying the columns we want

out = A(:,idxC)

Results:

>> out

out =

    1007        4044
     552         300
    2010        1113

the column vector [5002;431;1100] was eliminated because [2;10;55] is contained in X{2} = [2 10 55 9 17]

the column vector [622;124;88] was eliminated because [32 12] = X{4}

Another example: with the same X

    M = [1007  4044  1007  4044  1007  5002 5002 5002 622 622  1007  1007  1007;
          552   300   552   300   552   431  431  431 124 124   552    11    11; 
         2010  1113  2010  1113  2010  1100 1100 1100  88  88  2010    20    20;
           12    25    15    12    30     2   10   55  32  12     7    12     7];

X = {[2 5 68 44],[2 10 55 9 17],[1 55 6 7 8 9],[32 12]};

A = [1007  4044  5002  622  1077;
      552   300   431  124    11;
     2010  1113  1100   88    20];

Results: (with scmg answer)

I get if A sorted according to the first row: (correct result)

out =

         1007        1007        4044
           11         552         300
           20        2010        1113

if I do not sort the matrix A, I get: (false result)

out =

        4044        5002         622
         300         431         124
        1113        1100          88

the column vector A(:,4) = [622;124;88] should be eliminated because [32 12] = X{4}.

the column vector [5002;431;1100] should be eliminated because [2;10;55] is contained in X{2} = [2 10 55 9 17]

like image 245
bzak Avatar asked May 22 '15 20:05

bzak


People also ask

How do you avoid a loop in programming?

Recursion. Another way we can avoid using imperative loops is through recursion. Recursion is simple. Have a function call itself (which creates a loop) and design an exit condition out of that loop.

How can we reduce the time complexity of a for loop?

Putting all together. Continuing on the challenge to reduce the iterations on our algorithm, we have to perform the following steps: build the "index" with the information to be accessed later. iterate over the loop and fetch the additional information from the previously created "index"

How can loop performance be improved?

The best ways to improve loop performance are to decrease the amount of work done per iteration and decrease the number of loop iterations. Generally speaking, switch is always faster than if-else , but isn't always the best solution.


2 Answers

The answer of Ben Voigt is great, but the line for A4i = A4{ii} is the one causing issues : the for loop doesn't work this way with column vectors :

%row vector
for i = 1:3
    disp('foo');
end

    foo
    foo
    foo

%column vector
for i = (1:3).'
    disp('foo');
end

    foo

Just try for A4i = A4{ii}.' instead and it should get your work done!

Now, if we look at the output :

A(:,idxC) =

    4044        5002
     300         431
    1113        1100

As you can see, the final result is not what we expected.

As long as unique does a kind of sort, the subs are not numbered by the order of encounter in A, but by order of encounter in C (which is sorted) :

subs =

 2
 2
 3
 2
 3
 2
 4
 4
 4
 1
 1

Therefore you should pass by the matrix given by unique rather than A to get your final output

Enter

[C, ~, subs] = unique(M(1:3,:)','rows'); 
%% rather than [~, ~, subs] = unique(M(1:3,:)','rows');

Then, to get the final output, enter

>> out = C(idxC,:).'
out =

        1007        4044
         552         300
        2010        1113
like image 96
Ikaros Avatar answered Oct 23 '22 12:10

Ikaros


In this case, you should not be trying to eliminate loops. The vectorization is actually hurting you badly.

In particular (giving a name to your anonymous lambda)

issubset = @(x) all(ismember(A4{ii},x))

is ridiculously inefficient, because it doesn't short-circuit. Replace that with a loop.

Same for

any(cellfun(issubset, X))

Use an approach similar to this instead:

idxC = true(size(A4));
NX = numel(X);
for ii = 1:length(A4)
    for jj = 1:NX
        xj = X{jj};
        issubset = true;
        for A4i=A4{ii}
            if ~ismember(A4i, xj)
                issubset = false;
                break;
            end;
        end;
        if issubset
            idxC(ii) = false;
            break;
        end;
    end;
end;

The two break statements, and especially the second one, trigger an early exit that potentially saves you a huge amount of computation.

like image 32
Ben Voigt Avatar answered Oct 23 '22 13:10

Ben Voigt