I would like to find out the three closest numbers in a vector. Something like <pre class="prettyprint"><code>v = c(10,23,25,26,38,50) c = findClosest(v,3) c 23 25 26 </code></pre> I tried with <code>sort(colSums(as.matrix(dist(x))))[1:3]</code>, and it kind of works, but it selects the three numbers with minimum overall distance not the three closest numbers. There is already an answer for matlab, but I do not know how to translate it to R: <pre class="prettyprint"><code>%finds the index with the minimal difference in A minDiffInd = find(abs(diff(A))==min(abs(diff(A)))); %extract this index, and it's neighbor index from A val1 = A(minDiffInd); val2 = A(minDiffInd+1); </code></pre> How to find two closest (nearest) values within a vector in MATLAB?

My assumption is that the for the <code>n</code> nearest values, the only thing that matters is the difference between the <code>v[i] - v[i - (n-1)]</code>. That is, finding the minimum of <code>diff(x, lag = n - 1L)</code>. <pre class="prettyprint"><code>findClosest <- function(x, n) { x <- sort(x) x[seq.int(which.min(diff(x, lag = n - 1L)), length.out = n)] } findClosest(v, 3L) [1] 23 25 26 </code></pre>

Let's define "nearest numbers" by "numbers with minimal sum of L1 distances". You can achieve what you want by a combination of <code>diff</code> and windowed sum. You could write a much shorter function but I wrote it step by step to make it easier to follow. <pre class="prettyprint lang-r prettyprint-override"><code>v <- c(10,23,25,26,38,50) #' Find the n nearest numbers in a vector #' #' @param v Numeric vector #' @param n Number of nearest numbers to extract #' #' @details "Nearest numbers" defined as the numbers which minimise the #' within-group sum of L1 distances. #' findClosest <- function(v, n) { # Sort and remove NA v <- sort(v, na.last = NA) # Compute L1 distances between closest points. We know each point is next to # its closest neighbour since we sorted. delta <- diff(v) # Compute sum of L1 distances on a rolling window with n - 1 elements # Why n-1 ? Because we are looking at deltas and 2 deltas ~ 3 elements. withingroup_distances <- zoo::rollsum(delta, k = n - 1) # Now it's simply finding the group with minimum within-group sum # And working out the elements group_index <- which.min(withingroup_distances) element_indices <- group_index + 0:(n-1) v[element_indices] } findClosest(v, 2) # 25 26 findClosest(v, 3) # 23 25 26 </code></pre>

An idea is to use <code>zoo</code> library to do a rolling operation, i.e. <pre class="prettyprint"><code>library(zoo) m1 <- rollapply(v, 3, by = 1, function(i)c(sum(diff(i)), c(i))) m1[which.min(m1[, 1]),][-1] #[1] 23 25 26 </code></pre> Or make it into a function, <pre class="prettyprint"><code>findClosest <- function(vec, n) { require(zoo) vec1 <- sort(vec) m1 <- rollapply(vec1, n, by = 1, function(i) c(sum(diff(i)), c(i))) return(m1[which.min(m1[, 1]),][-1]) } findClosest(v, 3) #[1] 23 25 26 </code></pre>

How to find the three closest (nearest) values within a vector?

Tags:

r

I would like to find out the three closest numbers in a vector. Something like

v = c(10,23,25,26,38,50)
c = findClosest(v,3)
c
23 25 26

I tried with sort(colSums(as.matrix(dist(x))))[1:3], and it kind of works, but it selects the three numbers with minimum overall distance not the three closest numbers.

There is already an answer for matlab, but I do not know how to translate it to R:

%finds the index with the minimal difference in A
minDiffInd = find(abs(diff(A))==min(abs(diff(A))));
%extract this index, and it's neighbor index from A
val1 = A(minDiffInd);
val2 = A(minDiffInd+1);

How to find two closest (nearest) values within a vector in MATLAB?

799

asked Jul 31 '19 08:07

Terry

4 Answers

My assumption is that the for the n nearest values, the only thing that matters is the difference between the v[i] - v[i - (n-1)]. That is, finding the minimum of diff(x, lag = n - 1L).

findClosest <- function(x, n) {
  x <- sort(x)
  x[seq.int(which.min(diff(x, lag = n - 1L)), length.out = n)]
}

findClosest(v, 3L)

[1] 23 25 26

104

answered Oct 21 '22 05:10

Cole

Let's define "nearest numbers" by "numbers with minimal sum of L1 distances". You can achieve what you want by a combination of diff and windowed sum.

You could write a much shorter function but I wrote it step by step to make it easier to follow.

v <- c(10,23,25,26,38,50)

#' Find the n nearest numbers in a vector
#'
#' @param v Numeric vector
#' @param n Number of nearest numbers to extract
#'
#' @details "Nearest numbers" defined as the numbers which minimise the
#'   within-group sum of L1 distances.
#'   
findClosest <- function(v, n) {
  # Sort and remove NA
  v <- sort(v, na.last = NA)

  # Compute L1 distances between closest points. We know each point is next to
  # its closest neighbour since we sorted.
  delta <- diff(v)

  # Compute sum of L1 distances on a rolling window with n - 1 elements
  # Why n-1 ? Because we are looking at deltas and 2 deltas ~ 3 elements.
  withingroup_distances <- zoo::rollsum(delta, k = n - 1)

  # Now it's simply finding the group with minimum within-group sum
  # And working out the elements
  group_index <- which.min(withingroup_distances)
  element_indices <- group_index + 0:(n-1)

  v[element_indices]
}

findClosest(v, 2)
# 25 26
findClosest(v, 3)
# 23 25 26

answered Oct 21 '22 04:10

asachet

A base R option, idea being we first sort the vector and subtract every ith element with i + n - 1 element in the sorted vector and select the group which has minimum difference.

closest_n_vectors <- function(v, n) {
   v1 <- sort(v)
   inds <- which.min(sapply(head(seq_along(v1), -(n - 1)), function(x) 
                     v1[x + n -1] - v1[x]))
   v1[inds: (inds + n - 1)]
}

closest_n_vectors(v, 3)
#[1] 23 25 26

closest_n_vectors(c(2, 10, 1, 20, 4, 5, 23), 2)
#[1] 1 2

closest_n_vectors(c(19, 23, 45, 67, 89, 65, 1), 2)
#[1] 65 67

closest_n_vectors(c(19, 23, 45, 67, 89, 65, 1), 3)
#[1]  1 19 23

In case of tie this will return the numbers with smallest value since we are using which.min.

BENCHMARKS

Since we have got quite a few answers, it is worth doing a benchmark of all the solutions till now

set.seed(1234)
x <- sample(100000000, 100000)

identical(findClosest_antoine(x, 3), findClosest_Sotos(x, 3), 
          closest_n_vectors_Ronak(x, 3), findClosest_Cole(x, 3))
#[1] TRUE

microbenchmark::microbenchmark(
    antoine = findClosest_antoine(x, 3),
    Sotos = findClosest_Sotos(x, 3), 
    Ronak  = closest_n_vectors_Ronak(x, 3),
    Cole = findClosest_Cole(x, 3),
    times = 10
)



#Unit: milliseconds
#  expr      min       lq     mean   median       uq      max neval cld
#antoine  148.751  159.071  163.298  162.581  167.365  181.314    10  b 
#  Sotos 1086.098 1349.762 1372.232 1398.211 1453.217 1553.945    10   c
#  Ronak   54.248   56.870   78.886   83.129   94.748  100.299    10 a  
#   Cole    4.958    5.042    6.202    6.047    7.386    7.915    10 a

answered Oct 21 '22 06:10

Ronak Shah

An idea is to use zoo library to do a rolling operation, i.e.

library(zoo)
m1 <- rollapply(v, 3, by = 1, function(i)c(sum(diff(i)), c(i)))
m1[which.min(m1[, 1]),][-1]
#[1] 23 25 26

Or make it into a function,

findClosest <- function(vec, n) {
    require(zoo)
    vec1 <- sort(vec)
    m1 <- rollapply(vec1, n, by = 1, function(i) c(sum(diff(i)), c(i)))
    return(m1[which.min(m1[, 1]),][-1])
}

findClosest(v, 3)
#[1] 23 25 26

answered Oct 21 '22 04:10

Sotos

Related questions
                            
                                Make y-axis logarithmic in histogram using R [duplicate]
                            
                                Could not find function inside clusterApply
                            
                                Mahalonobis distance in R, error: system is computationally singular
                            
                                Add comma to numbers every three digits in datatable (R)
                            
                                Using custom OTF fonts in ggplot2
                            
                                Convenient way to access variables label after importing Stata data with haven
                            
                                In R plotly subplot graph, how to show only one legend?
                            
                                R: combine several gsub() function in a pipe
                            
                                if_else() `false` must be type double, not integer - in R
                            
                                How to sort data by column in descending order in R
                            
                                max and min functions that are similar to colMeans
                            
                                Barplot with 2 variables side by side
                            
                                ggplot set scale_color_gradientn manually
                            
                                Add a popup with error, warning to shiny
                            
                                R shiny conditionalPanel output value
                            
                                Load a small random sample from a large csv file into R data frame
                            
                                Convert Factor to Date/Time in R
                            
                                Is it possible to push/pull variables between two instances of R?
                            
                                Extract last non-missing value in row with data.table
                            
                                R Plotly Deselect trace by default

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With