Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Closest other Value in the same Vector

Tags:

r

I have a vector

set.seed(2)
x <- sample.int(20, 5)

[1]  4 14 11  3 16

Now, for every element I want to find

the element with the minimum distance (min(abs(x[i]-x[-i])) for element i), which here would be

[1]  3 16 14  4 14

the (first) index of the element with the minimum distance, which here would be

[1] 4 5 2 1 2

The point is that the element itself is not considered, but only all the other elements, which is why this R - Fastest way to find nearest value in vector is not the answer.

If the actual answer is out there, sorry - I didn't find it.

like image 756
Georgery Avatar asked Dec 02 '22 09:12

Georgery


1 Answers

1) Rfast Using dista in Rfast we get the indexes of the closest two. Take the second closest as the closest will be the same value.

library(Rfast)
x <- c(4, 14, 11, 3, 16) # input

x[ dista(x, x, k = 2, index = TRUE)[, 2] ]
## [1]  3 16 14  4 14

2) sqldf Using SQL we can left join DF to itself excluding the same value value and take the row with the minimum distance.

DF <- data.frame(x)   # x is from (1)
sqldf("select a.x, b.x nearest, min(abs(a.x - b.x)) 
  from DF a 
  left join DF b on a.x != b.x 
  group by a.rowid")[1:2]

giving:

   x nearest
1  4       3
2 14      16
3 11      14
4  3       4
5 16      14

3) zoo Sort the input, take the value corresponding to the least difference on either of side of each element and order it back.

library(zoo)
ix <- order(x)
least <- function(x) if (x[2] - x[1] < x[3] - x[2]) x[1] else x[3]
rollapply(c(-Inf, x[ix], Inf), 3, least)[order(ix)]
## [1]  3 16 14  4 14

4) Base R Using ix and least from (3) we can mimic (3) using only base functions as follows.

apply(embed(c(-Inf, x[ix], Inf),  3)[, 3:1], 1, least)[order(ix)]
## [1]  3 16 14  4 14

4a) This slightly shorter variation would also work:

-apply(embed(-c(-Inf, x[ix], Inf),  3), 1, least)[order(ix)]
## [1]  3 16 14  4 14

4b) Simplifying further we have the following base solution where, again, ix is from (3):

xx <- x[ix]
x1 <- c(-Inf, xx[-length(xx)])
x2 <- c(xx[-1], Inf)
ifelse(xx - x1 < x2 - xx, x1, x2)[order(ix)]
## [1]  3 16 14  4 14

Duplicates

The example in the question had no duplicates but if there were duplicates there is some question regarding the problem definition. For example if we had c(1, 3, 4, 1) then if we look at the first value, 1, there is another value exactly equal to it so the closest value is 1. Another interpretation is that the closest value not equal to 1 should be returned which in this case is 3. In the codes above the sqldf solution gives the closest value not equal to the current value whereas the others give the closest value among the remaining values.

If we wanted the interpretation of the closest not equal for those other than sqldf then we could use rle after ordering to compress it down to unique values and then use inverse.rle afterwards as shown on the modified 4b:

x <- c(1, 3, 4, 1)
ix <- order(x)
r <- rle(x[ix])
xx <- r$values
x1 <- c(-Inf, xx[-length(xx)])
x2 <- c(xx[-1], Inf)
r$values <- ifelse(xx - x1 < x2 - xx, x1, x2)
inverse.rle(r)[order(ix)]
## [1] 3 4 3 3
like image 55
G. Grothendieck Avatar answered Dec 28 '22 23:12

G. Grothendieck