Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find elements meeting any of a number of criteria

Tags:

matlab

I am trying to find the indices of elements in a vector that correspond to another vector, preferably without using loops. For example, my input might be:

DJiSet = [5 7 8];                   % elements of which I need the indices
JiSet = [3 4 5 6 7 8 9 11 12 20];   % vector to search

The output here would be:

[3 5 6]

The fastest I've come up with so far is this:

Output = find(ismember(JiSet,DJiSet));

However I have a feeling this could be done faster, especially since I thought the find command is rather slow.

Things to note:

  • The values in DJiSet are guaranteed to all be present in JiSet
  • JiSet is always sorted in ascending order, without repeated entries
  • The vector DJiSet is not guaranteed to be found contiguously in JiSet
like image 320
Wouter Kuijsters Avatar asked Mar 17 '15 14:03

Wouter Kuijsters


4 Answers

Approach #1

You can avoid find by reversing the places of DJiSet and JiSet inside ismember and then use the second output that gives us the matching indices -

[~,out] = ismember(DJiSet,JiSet)

Approach #2

Loopy approach catering to the specific conditions set in the question could be tried out, not sure if this will be more efficient though -

intv_idx = zeros(1,numel(DJiSet));
intv_idx(1) = find(JiSet==DJiSet(1),1);
start = intv_idx(1)+1;
for k = 2:numel(DJiSet)
    idx = find(JiSet(start:end)==DJiSet(k),1);
    start = idx+start;
    intv_idx(k) = idx;
end
out = cumsum(intv_idx);
like image 106
Divakar Avatar answered Nov 04 '22 23:11

Divakar


Divakar's answer is the way to go. But in case you want to do it more manually:

[~, Output] = max(bsxfun(@eq, DJiSet(:).', JiSet(:)), [], 1);

This finds the first occurrence if there are more than one.

If the values in DJiSet were not guaranteed to be present in JiSet, you could use a small modification:

[val, Output] = max(bsxfun(@eq, DJiSet(:).', JiSet(:))); %'
Output(~val) = 0; %// 0 indicates "not found"
like image 43
Luis Mendo Avatar answered Nov 05 '22 00:11

Luis Mendo


For small datasets, it seems my original approach was faster than both the ismember solution proposed by Divakar and the intersect solution proposed by qmeeeeeee, but all three get beaten by Luis Mendo's solution using good old bsxfun. See below code, which times each approach:

function somescript()

IsmemberTime = timeit(@membersol)
IntersectTime = timeit(@intersectsol)
FindTime = timeit(@naivesol)
BsxTime = timeit(@bsxfunsol)

    function membersol()
        rng(1)
        set = randi(30,[1000 15]);             % generate 1000 vectors of length 15, containing random integers
        for i=1:1000
            [~,out] = ismember(set(i,1:5),set(i,6:end));      % first 5 random integers are the values to be found in the remainder of the vector
        end

    end

    function intersectsol()
        rng(1)
        set = randi(30,[1000 15]);
        for i=1:1000
            [~,~,Output] = intersect(set(i,1:5),set(i,6:end));
        end
    end

    function naivesol()
        rng(1)
        set = randi(30,[1000 15]);
        for i=1:1000
            Output = find(ismember(set(i,6:end),set(i,1:5)));
        end
    end

    function bsxfunsol()
        rng(1)
        set = randi(30,[1000 15]);
        for i=1:1000
            [~, Output] = max(bsxfun(@eq, set(i,1:5).', set(i,6:end)), [], 1);
        end
    end
end

Which on my machine (running R2014b) returns the following timings:

IsmemberTime =

    0.1101


IntersectTime =

    0.2008


FindTime =

    0.0698


BsxTime =

    0.0218

This suggests that, for small data sets at least, using find and ismember on the inverted order of vectors is actually faster than using ismember alone. Since there is also some fixed overhead for all methods from the generation of the datasets set that are used to test with, the difference seems to be pretty big. More thorough tests can ben found in the comments below.

like image 5
Wouter Kuijsters Avatar answered Nov 05 '22 00:11

Wouter Kuijsters


Perhaps you could try intersect? It is suppose to be a lot faster:

[Intersect,indDJiSet,indJiSet] = intersect(DJiSet,JiSet)

The ordering does not matter, as long as the element exist in both list, the ind elements gives the index.

like image 4
GameOfThrows Avatar answered Nov 04 '22 23:11

GameOfThrows