Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way of finding the only index of vector b where array A(i,j) == b

I have 2 big arrays A and b:

A: 10.000++ rows, 4 columns, not unique integers
b: vector with 500.000++ elements, unique integers

Due to the uniqueness of the values of b, I need to find the only index of b, where A(i,j) == b.

What I started with is

[rows,columns] = size(A);
B = zeros(rows,columns);
for i = 1 : rows
    for j = 1 : columns
        B(i,j) = find(A(i,j)==b,1);
    end
end

This takes approx 5.5 seconds to compute, which is way to long, since A and b can be significantly bigger... That in mind I tried to speed up the code by using logical indexing and reducing the for-loops

[rows,columns] = size(A);
B = zeros(rows,columns);
for idx = 1 : numel(b)
    B(A==b(idx)) = idx;
end

Sadly this takes even longer: 21 seconds

I even tried to do use bsxfun

for i = 1 : columns
   [I,J] = find(bsxfun(@eq,A(:,i),b))
    ... stitch B together ...
end

but with a bigger arrays the maximum array size is quickly exceeded (102,9GB...).

Can you help me find a faster solution to this? Thanks in advance!

EDIT: I extended find(A(i,j)==b,1), which speeds up the algorithm by factor 2! Thank you, but overall still too slow... ;)

like image 483
Johannes Avatar asked May 31 '18 16:05

Johannes


People also ask

How do you find the index of an array?

To find the position of an element in an array, you use the indexOf() method. This method returns the index of the first occurrence the element that you want to find, or -1 if the element is not found.

How do you find the index of a vector in MATLAB?

k = find( X ) returns a vector containing the linear indices of each nonzero element in array X . If X is a vector, then find returns a vector with the same orientation as X . If X is a multidimensional array, then find returns a column vector of the linear indices of the result.

How do you find the index of an element in a NumPy array?

Using ndenumerate() function to find the Index of value It is usually used to find the first occurrence of the element in the given numpy array.


1 Answers

The function ismember is the right tool for this:

[~,B] = ismember(A,b);

Test code:

function so
  A = rand(1000,4);
  b = unique([A(:);rand(2000,1)]);

  B1 = op1(A,b);
  B2 = op2(A,b);
  isequal(B1,B2)

  tic;op1(A,b);op1(A,b);op1(A,b);op1(A,b);toc
  tic;op2(A,b);op2(A,b);op2(A,b);op2(A,b);toc
end

function B = op1(A,b)
  B = zeros(size(A));
  for i = 1:numel(A)
    B(i) = find(A(i)==b,1);
  end
end

function B = op2(A,b)
  [~,B] = ismember(A,b);
end

I ran this on Octave, which is not as fast with loops as MATLAB. It also doesn't have the timeit function, hence the crappy timing using tic/toc (sorry for that). In Octave, op2 is more than 100 times faster than op1. Timings will be different in MATLAB, but ismember should still be the fastest option. (Note I also replaced your double loop with a single loop, this is the same but simpler and probably faster.)

If you want to repeatedly do the search in b, it is worthwhile to sort b first, and implement your own binary search. This will avoid the checks and sorting that ismember does. See this other question.

like image 64
Cris Luengo Avatar answered Oct 25 '22 12:10

Cris Luengo