Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I efficiently find the index of a value in a sorted array?

Tags:

r

bisection

I have a sorted array of values and a single value like so:

x <- c(1.0, 3.45, 5.23, 7.3, 12.5, 23.45)
v <- 6.45

I can find the index of the value after which v would be inserted into x while maintaining the sorting order:

max(which(x <= v))
[1] 3

It is nice and compact code, but I have the gut feeling that behind-the-scenes this is really inefficient: since which() does not know that the array is sorted it has to inspect all values.

Is there a better way of finding this index value?

Note: I am not interested in actually merging v into x. I just want the index value.

like image 997
Patrick Avatar asked Dec 06 '21 04:12

Patrick


People also ask

Which one is an efficient way to search through a sorted array?

If we know nothing about the distribution of key values, then we have just proved that binary search is the best algorithm available for searching a sorted array.

How do you find the index of an array?

To find the position of an element in an array, you use the indexOf() method. This method returns the index of the first occurrence the element that you want to find, or -1 if the element is not found. The following illustrates the syntax of the indexOf() method.

How do you return a value from a sorted array?

Given a sorted array of integer numbers, write a function which returns zero-based position on which the specified value is located. Function should return negative value if requested number cannot be found in the array. If value occurs more than once, function should return position of the first occurrence.

How do I find the index of a specific value?

Whenever I've wanted to find the index of a specific value I subtract the value of the element I want then take the min () of the abs () of that. Sign in to answer this question.

How do I create an array of 10 random numbers?

The command to create an array of 10 random numbers, display the contents of the array, find the index number of one item in the array, and then verify that value is shown in the following image. It is common to need to work with one half of an array at a time. As it turns out, this is incredibly easy to do—I simply use the range operator.

How to check if an array contains a requested value?

The simplest way to see if an array contains a requested value is to iterate through the array and stop as soon as the value is located. Alternatively, search fails if end of array is reached.


Video Answer


3 Answers

If you need a faster version and you don't need to check your inputs you can write an easy C++ function:

Rcpp::cppFunction(
  "int foo(double x, const Rcpp::NumericVector& v)
  {
    int min = 0;
    int max = v.size();
    while (max - min > 1)
    {
      int idx = (min + max) / 2;
      if (v[idx] > x)
      {
        max = idx;
      }
      else
      {
        min = idx;
      }
    }
    return min + 1;
  }"
)

If you need it, you can check if (x < v[0]) by yourself (I don't know what you want to see in this case). And you can test it by using package microbenchmark:

library(microbenchmark)

n = 1e6
v = sort(rnorm(n, 0, 15))
x = runif(1, -15, 15)
microbenchmark(max(which(v <= x)), sum(v <= x), findInterval(x, v), foo(x, v))

Result:

Enter image description here

like image 167
Егор Шишунов Avatar answered Oct 26 '22 11:10

Егор Шишунов


Benchmark based on Егор-Шишунов's answer:

# Functions:
Rcpp::cppFunction(
  "int Erop(double x, const Rcpp::NumericVector& v)
  {
    int min = 0;
    int max = v.size();
    while (max - min > 1)
    {
      int idx = (min + max) / 2;
      if (v[idx] > x)
      {
        max = idx;
      }
      else
      {
        min = idx;
      }
    }
    return min + 1;
  }"
)

Rcpp::cppFunction(
  "int GKi(double v, const Rcpp::NumericVector& x) {
     return std::distance(x.begin(), std::upper_bound(x.begin(), x.end(), v));
}")

Rcpp::cppFunction("
Rcpp::IntegerVector GKi2(const Rcpp::NumericVector& v 
                       , const Rcpp::NumericVector& x) {
  Rcpp::IntegerVector res(v.length());
  for(int i=0; i < res.length(); ++i) {
    res[i] = std::distance(x.begin(), std::upper_bound(x.begin(), x.end(), v[i]));
  }
  return res;
}")
# Data:
set.seed(42)
x <- sort(rnorm(1e6))
v <- sort(c(sample(x, 15), rnorm(15)))
# Result:
bench::mark(whichMax= sapply(v, \(v) max(which(x <= v)))
          , sum = sapply(v, \(v) sum(x<=v))
          , findInterval = findInterval(v, x)
          , Erop = sapply(v, \(v) Erop(v, x))
          , GKi = sapply(v, \(v) GKi(v, x))
          , GKi2 = GKi2(v, x)
)
#  expression        min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
#  <bch:expr>   <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>
#1 whichMax      92.03ms 102.32ms      9.15        NA    102.      5    56
#2 sum           74.91ms  77.84ms     12.0         NA     37.9     6    19
#3 findInterval 680.41µs 755.61µs   1263.          NA      0     632     0
#4 Erop          57.19µs  62.13µs  12868.          NA     24.0  6432    12
#5 GKi           54.53µs   60.4µs  13316.          NA     24.0  6657    12
#6 GKi2           2.02µs   2.38µs 386027.          NA      0   10000     0
like image 40
7 revs, 2 users 91% Avatar answered Oct 26 '22 11:10

7 revs, 2 users 91%


You can use findInterval which makes use of a binary search.

findInterval(v, x)
#[1] 3

Or using C++ upper_bound with Rcpp.

Rcpp::cppFunction(
  "int upper_bound(double v, const Rcpp::NumericVector& x) {
     return std::distance(x.begin(), std::upper_bound(x.begin(), x.end(), v));
}")

upper_bound(v, x)
#[1] 3

Or in case you have also a vector of positions like in findInterval.

Rcpp::cppFunction("
Rcpp::IntegerVector upper_bound2(const Rcpp::NumericVector& v
                               , const Rcpp::NumericVector& x) {
  Rcpp::IntegerVector res(v.length());
  for(int i=0; i < res.length(); ++i) {
    res[i] = std::distance(x.begin(), std::upper_bound(x.begin(), x.end(), v[i]));
  }
  return res;
}")

v <- c(3, 6.45)
upper_bound2(v, x)
#[1] 1 3
findInterval(v, x)
#[1] 1 3
like image 41
GKi Avatar answered Oct 26 '22 11:10

GKi