Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find the difference between a value and its closest value in a vector in R?

Tags:

r

statistics

I have a vector like below:

x= c(1,23,4,15,8,17,21)

after sort values in vector we have:

c(1,4,8,15,17,21,23)

my required output is :

c(3, 3, 4, 2, 2, 2, 2) 

Which contains the difference between the value and its closest value.

But if I want to have output without sorting, is there any solution? I need an out put like c(3,2,3,2,4,2,2) to know which sample has biggest value in output table (here 5th value is the result)

like image 894
star Avatar asked Jan 26 '16 14:01

star


3 Answers

d <- diff(sort(x))
pmin(c(d, NA), c(NA, d), na.rm = TRUE)
# [1] 3 3 4 2 2 2 2
like image 127
Julius Vainora Avatar answered Oct 30 '22 06:10

Julius Vainora


If I understand you correctly, you want to calculate the smallest value between a member of a vector and it's neighbours.

First, we sort the data.

x= sort(c(1,23,4,15,8,17,21))

Then, we calculate the difference with the left neighbour (which is missing for item 1) and the difference with the right neighbour (which is missing for item 2)

diffs <- cbind(c(NA,diff(x)),c(diff(x),NA))

So, now we have the difference to the left and right for each item, now all that's left is to find the smallest:

res <- apply(diffs,MARGIN=1, min, na.rm=T)

Note that while this solution contains an explanation, other provided solutions (notably the pmin-approach by @Julius) are probably faster when performance is an issue.

like image 7
Heroka Avatar answered Oct 30 '22 04:10

Heroka


Nice solutions. Julius' seems to be the fastest:

library(microbenchmark)
set.seed(1262016)
x <- sample(1e5)

all.equal(heroka, NicE, julius, Ambler)
[1] TRUE

microbenchmark(

  julius = {d <- diff(sort(x))
  pmin(c(d, NA), c(NA, d), na.rm = TRUE)},

  NicE = {x <- sort(x)
  pmin(abs(x-lag(x)),abs(x-lead(x)),na.rm=T)},

  Heroka = {x= sort(x)
  diffs <- cbind(c(NA,diff(x)),c(diff(x),NA))
  apply(diffs,MARGIN=1, min, na.rm=T)},

  Ambler = {n <- length(x)
  ds <- c(
    x[2] - x[1], 
    sapply(
      2:(n - 1), 
      function(i) min(x[i] - x[i - 1], x[i + 1] - x[i])
    ),
    x[n] - x[n - 1]
  )}
)
# Unit: milliseconds
#   expr        min         lq      mean     median        uq       max neval
# julius   4.167302   5.066164  13.94478   7.967066  10.11920  89.06298   100
# NicE     4.678274   6.804918  13.85149   9.297575  12.45606  83.41032   100
# Heroka 142.107887 176.768431 199.96590 196.269671 221.05851 299.30336   100
# Ambler 268.724129 309.238792 334.66432 329.252146 359.88103 409.38698   100
like image 7
Pierre L Avatar answered Oct 30 '22 04:10

Pierre L