I have a sorted array of values and a single value like so:
x <- c(1.0, 3.45, 5.23, 7.3, 12.5, 23.45)
v <- 6.45
I can find the index of the value after which v
would be inserted into x
while maintaining the sorting order:
max(which(x <= v))
[1] 3
It is nice and compact code, but I have the gut feeling that behind-the-scenes this is really inefficient: since which()
does not know that the array is sorted it has to inspect all values.
Is there a better way of finding this index value?
Note: I am not interested in actually merging v
into x
. I just want the index value.
If we know nothing about the distribution of key values, then we have just proved that binary search is the best algorithm available for searching a sorted array.
To find the position of an element in an array, you use the indexOf() method. This method returns the index of the first occurrence the element that you want to find, or -1 if the element is not found. The following illustrates the syntax of the indexOf() method.
Given a sorted array of integer numbers, write a function which returns zero-based position on which the specified value is located. Function should return negative value if requested number cannot be found in the array. If value occurs more than once, function should return position of the first occurrence.
Whenever I've wanted to find the index of a specific value I subtract the value of the element I want then take the min () of the abs () of that. Sign in to answer this question.
The command to create an array of 10 random numbers, display the contents of the array, find the index number of one item in the array, and then verify that value is shown in the following image. It is common to need to work with one half of an array at a time. As it turns out, this is incredibly easy to do—I simply use the range operator.
The simplest way to see if an array contains a requested value is to iterate through the array and stop as soon as the value is located. Alternatively, search fails if end of array is reached.
If you need a faster version and you don't need to check your inputs you can write an easy C++ function:
Rcpp::cppFunction(
"int foo(double x, const Rcpp::NumericVector& v)
{
int min = 0;
int max = v.size();
while (max - min > 1)
{
int idx = (min + max) / 2;
if (v[idx] > x)
{
max = idx;
}
else
{
min = idx;
}
}
return min + 1;
}"
)
If you need it, you can check if (x < v[0])
by yourself (I don't know what you want to see in this case). And you can test it by using package microbenchmark:
library(microbenchmark)
n = 1e6
v = sort(rnorm(n, 0, 15))
x = runif(1, -15, 15)
microbenchmark(max(which(v <= x)), sum(v <= x), findInterval(x, v), foo(x, v))
Result:
Benchmark based on Егор-Шишунов's answer:
# Functions:
Rcpp::cppFunction(
"int Erop(double x, const Rcpp::NumericVector& v)
{
int min = 0;
int max = v.size();
while (max - min > 1)
{
int idx = (min + max) / 2;
if (v[idx] > x)
{
max = idx;
}
else
{
min = idx;
}
}
return min + 1;
}"
)
Rcpp::cppFunction(
"int GKi(double v, const Rcpp::NumericVector& x) {
return std::distance(x.begin(), std::upper_bound(x.begin(), x.end(), v));
}")
Rcpp::cppFunction("
Rcpp::IntegerVector GKi2(const Rcpp::NumericVector& v
, const Rcpp::NumericVector& x) {
Rcpp::IntegerVector res(v.length());
for(int i=0; i < res.length(); ++i) {
res[i] = std::distance(x.begin(), std::upper_bound(x.begin(), x.end(), v[i]));
}
return res;
}")
# Data:
set.seed(42)
x <- sort(rnorm(1e6))
v <- sort(c(sample(x, 15), rnorm(15)))
# Result:
bench::mark(whichMax= sapply(v, \(v) max(which(x <= v)))
, sum = sapply(v, \(v) sum(x<=v))
, findInterval = findInterval(v, x)
, Erop = sapply(v, \(v) Erop(v, x))
, GKi = sapply(v, \(v) GKi(v, x))
, GKi2 = GKi2(v, x)
)
# expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc
# <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl>
#1 whichMax 92.03ms 102.32ms 9.15 NA 102. 5 56
#2 sum 74.91ms 77.84ms 12.0 NA 37.9 6 19
#3 findInterval 680.41µs 755.61µs 1263. NA 0 632 0
#4 Erop 57.19µs 62.13µs 12868. NA 24.0 6432 12
#5 GKi 54.53µs 60.4µs 13316. NA 24.0 6657 12
#6 GKi2 2.02µs 2.38µs 386027. NA 0 10000 0
You can use findInterval which makes use of a binary search.
findInterval(v, x)
#[1] 3
Or using C++ upper_bound
with Rcpp
.
Rcpp::cppFunction(
"int upper_bound(double v, const Rcpp::NumericVector& x) {
return std::distance(x.begin(), std::upper_bound(x.begin(), x.end(), v));
}")
upper_bound(v, x)
#[1] 3
Or in case you have also a vector of positions like in findInterval
.
Rcpp::cppFunction("
Rcpp::IntegerVector upper_bound2(const Rcpp::NumericVector& v
, const Rcpp::NumericVector& x) {
Rcpp::IntegerVector res(v.length());
for(int i=0; i < res.length(); ++i) {
res[i] = std::distance(x.begin(), std::upper_bound(x.begin(), x.end(), v[i]));
}
return res;
}")
v <- c(3, 6.45)
upper_bound2(v, x)
#[1] 1 3
findInterval(v, x)
#[1] 1 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With