Consider a sorted vector x
that is bounded between min
and max
. Below is an example of such x
where min
could be 0
and max
could be 12
:
x = c(0.012, 1, exp(1), exp(1)+1e-55, exp(1)+1e-10,
exp(1)+1e-3, 3.3, 3.33333, 3.333333333333333, 3+1/3, 5, 5, 10, 12)
5
and 5
as well as exp(1)
and exp(1)+10^(-55)
have exactly the same value (to the level of accuracy of a float number). Some other entry differ largely and some others differ only by a small amount. I would like to consider an approximation to equality test
ApproxEqual = function(a,b) abs(a-b) < epsilon
, where epsilon
could be 1e-5
for example.
Goal
I would like to modify the values of the variable x
"as little as possible" to ensure that no two values in x
are "approximatively equal" and x
is still bounded between min
and max
.
I am happy to let you decide what "as little as possible" really mean. One could for example minimize the sum of square deviations between the original x
and the expected variable output.
Example 1
x_input = c(5, 5.1, 5.1, 5.1, 5.2)
min=1
max=100
x_output = c(5, 5.1-epsilon, 5.1, 5.1+epsilon, 5.2)
Example 2
x_input = c(2,2,2,3,3)
min=2
max=3
x_output = c(2, 2+epsilon, 2+2*epsilon, 2+3*epsilon, 3-epsilon,3)
Of course, in the above case if (3-epsilon) - (2+3*epsilon) < epsilon
is TRUE
, then the function should throw an error as the problem has no solution.
Side Note
I would love if the solution is quite performant. answer could make could use of Rcpp
for example.
I doubt this is possible without iterating, because shifting some points away from neighbours that are too close could cause the moved points to bunch up closer to their other neighbours. Here is one solution that only changes those values that are necessary to arrive at a solution, and moves them by the smallest distance it can to ensure a minimum gap of epsilon.
It uses a function that assigns a force to each point depending whether we need to move it away from a neighbour that is too close. Direction (sign) of the force indicates whether we need to increase or decrease the value of that point. Points that are sandwiched between other too-near neighbours do not move, but their outside neighbours both move away from the central point (this behaviour to move as few points as we can by as little as possible). The force assigned to the end points is always zero, because we do not want overall range of x to change
force <- function(x, epsilon){
c(0, sapply(2:(length(x)-1), function(i){ (x[i] < (x[i-1]+epsilon)) - (x[i] > (x[i+1]-epsilon)) }), 0)
}
Next, we need a function to shift points, depending on the force acting on them. Positive forces cause them to move to epsilon higher than the previous point. Negative forces shift them downwards.
move <- function(x, epsilon, f){
x[which(f==-1)] <- x[which(f==-1)+1] - epsilon
x[which(f==1)] <- x[which(f==1)-1] + epsilon
# Next line deals with boundary condition, and prevents points from bunching up at the edges of the range
# I doubt this is necessary, but included out of abundance of caution. Could try deleting this line if performance is an issue.
x <- sapply(1:(length(x)), function(i){x[i] <- max(x[i], head(x,1)+(i-1)*epsilon); x[i] <- min(x[i], tail(x,1)-(length(x)-i)*epsilon)})
x
}
Finally, the function separate
is used to iteratively calculate force and move points until a solution is found. It also checks for a couple of edge cases before iterating.
separate <- function(x,epsilon) {
if (epsilon > (range(x)[2] - range(x)[1]) / (length(x) - 1)) stop("no solution possible")
if (!(all(diff(x)>=0))) stop ("vector must be sorted, ascending")
initial.x <- x
solved <- FALSE
##################################
# A couple of edge cases to catch
##################################
# 1. catch cases when vector length < 3 (nothing to do, as there are no points to move)
if (length(x)<3) solved <- TRUE
# 2. catch cases where initial vector has values too close to the boundaries
x <- sapply(1:(length(x)), function(i){
x[i] <- max(x[i], head(x,1)+(i-1)*epsilon)
x[i] <- min(x[i], tail(x,1)-(length(x)-i)*epsilon)
})
# Now iterate to find solution
it <- 0
while (!solved) {
it <- it+1
f <- force(x, epsilon)
if (sum(abs(f)) == 0) solved <- TRUE
else x <- move(x, epsilon, f)
}
list(xhat=x, iterations=it, SSR=sum(abs(x-initial.x)^2))
}
Testing this on the example provided by OP:
x = c(0.012, 1, exp(1), exp(1)+1e-55, exp(1)+1e-10, exp(1)+1e-3, 3.3, 3.33333, 3.333333333333333, 3+1/3, 5, 5, 10, 12)
epsilon <- 1e-5
separate(x, epsilon)
# $xhat
# [1] 0.012000 1.000000 2.718272 2.718282 2.718292 2.719282 3.300000 3.333323 3.333333 3.333343
# [11] 4.999990 5.000000 10.000000 12.000000
#
# $iterations
# [1] 2
#
# $SSR
# [1] 4.444424e-10
Edit 1
Lines were added to function separate
in response to comment to catch a couple of edge cases -
A) where vector passed to function has length < 3
separate(c(0,1), 1e-5)
# $xhat
# [1] 0 1
#
# $iterations
# [1] 0
#
# $SSR
# [1] 0
B) where passed vector has several values at the boundaries
separate(c(0,0,0,1), 1e-5)
# [1] "it = 1, SSR = 5e-10"
# $xhat
# [1] 0e+00 1e-05 2e-05 1e+00
#
# $iterations
# [1] 1
#
# $SSR
# [1] 5e-10
This was a fun challenge, and I think I've worked out a solution. It's a bit ugly and convoluted and could do with some streamlining, but it seems to return what Remi asked for.
library(magrittr)
xin <- c(0.012, 1, exp(1), exp(1)+10^(-55), exp(1)+10^(-10),
exp(1)+10^(-3), 3.3, 3.33333, 3.333333333333333, 3+1/3, 5, 5, 10, 12)
tiebreaker <- function(x, t=3) {
dif <- diff(x) %>% round(t)
x[dif==0] <- x[dif==0] +
seq(-10^-t, -10^-(t+0.99),
length.out=length(x[dif==0])) %>% sort
x
}
xout <- tiebreaker(xin)
diff(xin) > 0.0001
# TRUE TRUE FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE FALSE TRUE TRUE
diff(xout) > 0.0001 #it makes close matches less close
# TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
xin == xout #but leaves already less close matches as they were
# TRUE TRUE FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE FALSE TRUE TRUE TRUE
EDIT: I wrapped it up into a simple function. tr
sets the threshold for what's considered a close match, in decimal points.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With