I have 2 big arrays A
and b
:
A
: 10.000++ rows, 4 columns, not unique integersb
: vector with 500.000++ elements, unique integers
Due to the uniqueness of the values of b
, I need to find the only index of b
, where A(i,j) == b
.
What I started with is
[rows,columns] = size(A);
B = zeros(rows,columns);
for i = 1 : rows
for j = 1 : columns
B(i,j) = find(A(i,j)==b,1);
end
end
This takes approx 5.5 seconds to compute, which is way to long, since A
and b
can be significantly bigger... That in mind I tried to speed up the code by using logical indexing and reducing the for-loops
[rows,columns] = size(A);
B = zeros(rows,columns);
for idx = 1 : numel(b)
B(A==b(idx)) = idx;
end
Sadly this takes even longer: 21 seconds
I even tried to do use bsxfun
for i = 1 : columns
[I,J] = find(bsxfun(@eq,A(:,i),b))
... stitch B together ...
end
but with a bigger arrays the maximum array size is quickly exceeded (102,9GB...).
Can you help me find a faster solution to this? Thanks in advance!
EDIT: I extended find(A(i,j)==b
,1
)
, which speeds up the algorithm by factor 2! Thank you, but overall still too slow... ;)
To find the position of an element in an array, you use the indexOf() method. This method returns the index of the first occurrence the element that you want to find, or -1 if the element is not found.
k = find( X ) returns a vector containing the linear indices of each nonzero element in array X . If X is a vector, then find returns a vector with the same orientation as X . If X is a multidimensional array, then find returns a column vector of the linear indices of the result.
Using ndenumerate() function to find the Index of value It is usually used to find the first occurrence of the element in the given numpy array.
The function ismember
is the right tool for this:
[~,B] = ismember(A,b);
Test code:
function so
A = rand(1000,4);
b = unique([A(:);rand(2000,1)]);
B1 = op1(A,b);
B2 = op2(A,b);
isequal(B1,B2)
tic;op1(A,b);op1(A,b);op1(A,b);op1(A,b);toc
tic;op2(A,b);op2(A,b);op2(A,b);op2(A,b);toc
end
function B = op1(A,b)
B = zeros(size(A));
for i = 1:numel(A)
B(i) = find(A(i)==b,1);
end
end
function B = op2(A,b)
[~,B] = ismember(A,b);
end
I ran this on Octave, which is not as fast with loops as MATLAB. It also doesn't have the timeit
function, hence the crappy timing using tic
/toc
(sorry for that). In Octave, op2
is more than 100 times faster than op1
. Timings will be different in MATLAB, but ismember
should still be the fastest option. (Note I also replaced your double loop with a single loop, this is the same but simpler and probably faster.)
If you want to repeatedly do the search in b
, it is worthwhile to sort b
first, and implement your own binary search. This will avoid the checks and sorting that ismember
does. See this other question.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With