Faster version of find for sorted vectors (MATLAB)

Tags:

I have code of the following kind in MATLAB:

indices = find([1 2 2 3 3 3 4 5 6 7 7] == 3)

This returns 4,5,6 - the indices of elements in the array equal to 3. Now. my code does this sort of thing with very long vectors. The vectors are always sorted.

Therefore, I would like a function which replaces the O(n) complexity of find with O(log n), at the expense that the array has to be sorted.

I am aware of ismember, but for what I know it does not return the indices of all items, just the last one (I need all of them).

For reasons of portability, I need the solution to be MATLAB-only (no compiled mex files etc.)

942

asked Nov 23 '13 19:11

2 Answers

Here is a fast implementation using binary search. This file is also available on github

function [b,c]=findInSorted(x,range) %findInSorted fast binary search replacement for ismember(A,B) for the %special case where the first input argument is sorted. %    %   [a,b] = findInSorted(x,s) returns the range which is equal to s.  %   r=a:b and r=find(x == s) produce the same result    %   %   [a,b] = findInSorted(x,[from,to]) returns the range which is between from and to %   r=a:b and r=find(x >= from & x <= to) return the same result % %   For any sorted list x you can replace %   [lia] = ismember(x,from:to) %   with %   [a,b] = findInSorted(x,[from,to]) %   lia=a:b % %   Examples: % %       x  = 1:99 %       s  = 42 %       r1 = find(x == s) %       [a,b] = myFind(x,s) %       r2 = a:b %       %r1 and r2 are equal % %   See also FIND, ISMEMBER. % % Author Daniel Roeske <danielroeske.de>  A=range(1); B=range(end); a=1; b=numel(x); c=1; d=numel(x); if A<=x(1)    b=a; end if B>=x(end)     c=d; end while (a+1<b)     lw=(floor((a+b)/2));     if (x(lw)<A)         a=lw;     else         b=lw;     end end while (c+1<d)     lw=(floor((c+d)/2));     if (x(lw)<=B)         c=lw;     else         d=lw;     end end end

answered Oct 24 '22 06:10

Daniel's approach is clever and his myFind2 function is definitely fast, but there are errors/bugs that occur near the boundary conditions or in the case that the upper and lower bounds produce a range outside the set passed in.

Additionally, as he noted in his comment on his answer, his implementation had some inefficiencies that could be improved. I implemented an improved version of his code, which runs faster, while also correctly handling boundary conditions. Furthermore, this code includes more comments to explain what is happening. I hope this helps someone the way Daniel's code helped me here!

function [lower_index,upper_index] = myFindDrGar(x,LowerBound,UpperBound) % fast O(log2(N)) computation of the range of indices of x that satify the % upper and lower bound values using the fact that the x vector is sorted % from low to high values. Computation is done via a binary search. % % Input: % % x-            A vector of sorted values from low to high.        % % LowerBound-   Lower boundary on the values of x in the search % % UpperBound-   Upper boundary on the values of x in the search % % Output: % % lower_index-  The smallest index such that %               LowerBound<=x(index)<=UpperBound % % upper_index-  The largest index such that %               LowerBound<=x(index)<=UpperBound  if LowerBound>x(end) || UpperBound<x(1) || UpperBound<LowerBound     % no indices satify bounding conditions     lower_index = [];     upper_index = [];     return; end  lower_index_a=1; lower_index_b=length(x); % x(lower_index_b) will always satisfy lowerbound upper_index_a=1;         % x(upper_index_a) will always satisfy upperbound upper_index_b=length(x);  % % The following loop increases _a and decreases _b until they differ  % by at most 1. Because one of these index variables always satisfies the  % appropriate bound, this means the loop will terminate with either  % lower_index_a or lower_index_b having the minimum possible index that  % satifies the lower bound, and either upper_index_a or upper_index_b  % having the largest possible index that satisfies the upper bound.  % while (lower_index_a+1<lower_index_b) || (upper_index_a+1<upper_index_b)      lw=floor((lower_index_a+lower_index_b)/2); % split the upper index      if x(lw) >= LowerBound         lower_index_b=lw; % decrease lower_index_b (whose x value remains \geq to lower bound)        else         lower_index_a=lw; % increase lower_index_a (whose x value remains less than lower bound)         if (lw>upper_index_a) && (lw<upper_index_b)             upper_index_a=lw;% increase upper_index_a (whose x value remains less than lower bound and thus upper bound)         end     end      up=ceil((upper_index_a+upper_index_b)/2);% split the lower index     if x(up) <= UpperBound         upper_index_a=up; % increase upper_index_a (whose x value remains \leq to upper bound)      else         upper_index_b=up; % decrease upper_index_b         if (up<lower_index_b) && (up>lower_index_a)             lower_index_b=up;%decrease lower_index_b (whose x value remains greater than upper bound and thus lower bound)         end     end end  if x(lower_index_a)>=LowerBound     lower_index = lower_index_a; else     lower_index = lower_index_b; end if x(upper_index_b)<=UpperBound     upper_index = upper_index_b; else     upper_index = upper_index_a; end

Note that the improved version of Daniels searchFor function is now simply:

function [lower_index,upper_index] = mySearchForDrGar(x,value)  [lower_index,upper_index] = myFindDrGar(x,value,value);

EDIT many years later: there was an error in the last two if/else blocks, fixed it.

answered Oct 24 '22 06:10

DrGar

Related questions
                            
                                Java: Replace all ' in a string with \'
                            
                                Date Math / Manipulation in Liquid Template Filter
                            
                                Which has faster performance indexesOfObjectsPassingTest or filteredArrayUsingPredicate?
                            
                                Javascript/Ember - Toggle Boolean Value of Variable
                            
                                how to open a local PDF in PDFJS using file input?
                            
                                Change a cookie value of a cookie that already exists
                            
                                Google Cloud Storage + Python : Any way to list obj in certain folder in GCS?
                            
                                Collapse text by group in data frame [duplicate]
                            
                                How to print the memory address of a slice in Golang?
                            
                                Display Image from Media Library in Umbraco 7
                            
                                Conform to protocol in ViewController, in Swift
                            
                                CREATE UNIQUE INDEX IF NOT EXISTS in postgreSQL

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Faster version of find for sorted vectors (MATLAB)

Tags:

ziutek

People also ask

2 Answers

Daniel

DrGar

Recent Activity

Donate For Us