Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find the three closest (nearest) values within a vector?

Tags:

r

I would like to find out the three closest numbers in a vector. Something like

v = c(10,23,25,26,38,50)
c = findClosest(v,3)
c
23 25 26

I tried with sort(colSums(as.matrix(dist(x))))[1:3], and it kind of works, but it selects the three numbers with minimum overall distance not the three closest numbers.

There is already an answer for matlab, but I do not know how to translate it to R:

%finds the index with the minimal difference in A
minDiffInd = find(abs(diff(A))==min(abs(diff(A))));
%extract this index, and it's neighbor index from A
val1 = A(minDiffInd);
val2 = A(minDiffInd+1);

How to find two closest (nearest) values within a vector in MATLAB?

like image 799
Terry Avatar asked Jul 31 '19 08:07

Terry


People also ask

How to find the nearest or nearest number in an array?

Find the closest or nearest number with array formula. For example, you have a list of numbers in Column A, and now you will find the closest value or the nearest value of 18 from the Column A. You can do it as follows: Select a blank cell, and enter below formula, and press the Ctrl + Shift + Enter keys together.

How do you find the closest vector to a specified value?

Finding the value in a vector that is closest to a specified value is straightforward using which. Here, we want to find the value of xv that is closest to 108.0: which(abs(xv-108)==min(abs(xv-108))) [1] 332. The closest value to 108.0 is in location 332.

How to find the closest value to the given value?

Excel Find Closest Value 1 Select the range where you will search for closest values to the give value, and then click Kutools > Select > Select... 2 In the opening Select Specific Cells dialog box, See More....

How to select all closest values to a range in Excel?

Sometimes, you may want to find out and select all closet values to the given value in a range. Actually, we can define a deviation value, and then apply Kutools for Excel’s Select Special Cells utility to find out and select all closest values within the diviation range of give value easily.


4 Answers

My assumption is that the for the n nearest values, the only thing that matters is the difference between the v[i] - v[i - (n-1)]. That is, finding the minimum of diff(x, lag = n - 1L).

findClosest <- function(x, n) {
  x <- sort(x)
  x[seq.int(which.min(diff(x, lag = n - 1L)), length.out = n)]
}

findClosest(v, 3L)

[1] 23 25 26
like image 104
Cole Avatar answered Oct 21 '22 05:10

Cole


Let's define "nearest numbers" by "numbers with minimal sum of L1 distances". You can achieve what you want by a combination of diff and windowed sum.

You could write a much shorter function but I wrote it step by step to make it easier to follow.

v <- c(10,23,25,26,38,50)

#' Find the n nearest numbers in a vector
#'
#' @param v Numeric vector
#' @param n Number of nearest numbers to extract
#'
#' @details "Nearest numbers" defined as the numbers which minimise the
#'   within-group sum of L1 distances.
#'   
findClosest <- function(v, n) {
  # Sort and remove NA
  v <- sort(v, na.last = NA)

  # Compute L1 distances between closest points. We know each point is next to
  # its closest neighbour since we sorted.
  delta <- diff(v)

  # Compute sum of L1 distances on a rolling window with n - 1 elements
  # Why n-1 ? Because we are looking at deltas and 2 deltas ~ 3 elements.
  withingroup_distances <- zoo::rollsum(delta, k = n - 1)

  # Now it's simply finding the group with minimum within-group sum
  # And working out the elements
  group_index <- which.min(withingroup_distances)
  element_indices <- group_index + 0:(n-1)

  v[element_indices]
}

findClosest(v, 2)
# 25 26
findClosest(v, 3)
# 23 25 26
like image 7
asachet Avatar answered Oct 21 '22 04:10

asachet


A base R option, idea being we first sort the vector and subtract every ith element with i + n - 1 element in the sorted vector and select the group which has minimum difference.

closest_n_vectors <- function(v, n) {
   v1 <- sort(v)
   inds <- which.min(sapply(head(seq_along(v1), -(n - 1)), function(x) 
                     v1[x + n -1] - v1[x]))
   v1[inds: (inds + n - 1)]
}

closest_n_vectors(v, 3)
#[1] 23 25 26

closest_n_vectors(c(2, 10, 1, 20, 4, 5, 23), 2)
#[1] 1 2

closest_n_vectors(c(19, 23, 45, 67, 89, 65, 1), 2)
#[1] 65 67

closest_n_vectors(c(19, 23, 45, 67, 89, 65, 1), 3)
#[1]  1 19 23

In case of tie this will return the numbers with smallest value since we are using which.min.


BENCHMARKS

Since we have got quite a few answers, it is worth doing a benchmark of all the solutions till now

set.seed(1234)
x <- sample(100000000, 100000)

identical(findClosest_antoine(x, 3), findClosest_Sotos(x, 3), 
          closest_n_vectors_Ronak(x, 3), findClosest_Cole(x, 3))
#[1] TRUE

microbenchmark::microbenchmark(
    antoine = findClosest_antoine(x, 3),
    Sotos = findClosest_Sotos(x, 3), 
    Ronak  = closest_n_vectors_Ronak(x, 3),
    Cole = findClosest_Cole(x, 3),
    times = 10
)



#Unit: milliseconds
#  expr      min       lq     mean   median       uq      max neval cld
#antoine  148.751  159.071  163.298  162.581  167.365  181.314    10  b 
#  Sotos 1086.098 1349.762 1372.232 1398.211 1453.217 1553.945    10   c
#  Ronak   54.248   56.870   78.886   83.129   94.748  100.299    10 a  
#   Cole    4.958    5.042    6.202    6.047    7.386    7.915    10 a  
like image 7
Ronak Shah Avatar answered Oct 21 '22 06:10

Ronak Shah


An idea is to use zoo library to do a rolling operation, i.e.

library(zoo)
m1 <- rollapply(v, 3, by = 1, function(i)c(sum(diff(i)), c(i)))
m1[which.min(m1[, 1]),][-1]
#[1] 23 25 26

Or make it into a function,

findClosest <- function(vec, n) {
    require(zoo)
    vec1 <- sort(vec)
    m1 <- rollapply(vec1, n, by = 1, function(i) c(sum(diff(i)), c(i)))
    return(m1[which.min(m1[, 1]),][-1])
}

findClosest(v, 3)
#[1] 23 25 26
like image 6
Sotos Avatar answered Oct 21 '22 04:10

Sotos