I am trying to find the indices of elements in a vector that correspond to another vector, preferably without using loops. For example, my input might be:
DJiSet = [5 7 8]; % elements of which I need the indices
JiSet = [3 4 5 6 7 8 9 11 12 20]; % vector to search
The output here would be:
[3 5 6]
The fastest I've come up with so far is this:
Output = find(ismember(JiSet,DJiSet));
However I have a feeling this could be done faster, especially since I thought the find
command is rather slow.
Things to note:
DJiSet
are guaranteed to all be present in JiSet
JiSet
is always sorted in ascending order, without repeated entriesDJiSet
is not guaranteed to be found contiguously in JiSet
Approach #1
You can avoid find
by reversing the places of DJiSet
and JiSet
inside ismember
and then use the second output that gives us the matching indices -
[~,out] = ismember(DJiSet,JiSet)
Approach #2
Loopy approach catering to the specific conditions set in the question could be tried out, not sure if this will be more efficient though -
intv_idx = zeros(1,numel(DJiSet));
intv_idx(1) = find(JiSet==DJiSet(1),1);
start = intv_idx(1)+1;
for k = 2:numel(DJiSet)
idx = find(JiSet(start:end)==DJiSet(k),1);
start = idx+start;
intv_idx(k) = idx;
end
out = cumsum(intv_idx);
Divakar's answer is the way to go. But in case you want to do it more manually:
[~, Output] = max(bsxfun(@eq, DJiSet(:).', JiSet(:)), [], 1);
This finds the first occurrence if there are more than one.
If the values in DJiSet
were not guaranteed to be present in JiSet
, you could use a small modification:
[val, Output] = max(bsxfun(@eq, DJiSet(:).', JiSet(:))); %'
Output(~val) = 0; %// 0 indicates "not found"
For small datasets, it seems my original approach was faster than both the ismember
solution proposed by Divakar and the intersect
solution proposed by qmeeeeeee, but all three get beaten by Luis Mendo's solution using good old bsxfun
. See below code, which times each approach:
function somescript()
IsmemberTime = timeit(@membersol)
IntersectTime = timeit(@intersectsol)
FindTime = timeit(@naivesol)
BsxTime = timeit(@bsxfunsol)
function membersol()
rng(1)
set = randi(30,[1000 15]); % generate 1000 vectors of length 15, containing random integers
for i=1:1000
[~,out] = ismember(set(i,1:5),set(i,6:end)); % first 5 random integers are the values to be found in the remainder of the vector
end
end
function intersectsol()
rng(1)
set = randi(30,[1000 15]);
for i=1:1000
[~,~,Output] = intersect(set(i,1:5),set(i,6:end));
end
end
function naivesol()
rng(1)
set = randi(30,[1000 15]);
for i=1:1000
Output = find(ismember(set(i,6:end),set(i,1:5)));
end
end
function bsxfunsol()
rng(1)
set = randi(30,[1000 15]);
for i=1:1000
[~, Output] = max(bsxfun(@eq, set(i,1:5).', set(i,6:end)), [], 1);
end
end
end
Which on my machine (running R2014b) returns the following timings:
IsmemberTime =
0.1101
IntersectTime =
0.2008
FindTime =
0.0698
BsxTime =
0.0218
This suggests that, for small data sets at least, using find
and ismember
on the inverted order of vectors is actually faster than using ismember
alone. Since there is also some fixed overhead for all methods from the generation of the datasets set
that are used to test with, the difference seems to be pretty big. More thorough tests can ben found in the comments below.
Perhaps you could try intersect? It is suppose to be a lot faster:
[Intersect,indDJiSet,indJiSet] = intersect(DJiSet,JiSet)
The ordering does not matter, as long as the element exist in both list, the ind elements gives the index.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With