Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Editing array to ensure strictly increasing values

Consider a sorted vector x that is bounded between min and max. Below is an example of such x where min could be 0 and max could be 12:

x = c(0.012, 1, exp(1), exp(1)+1e-55, exp(1)+1e-10,
       exp(1)+1e-3, 3.3, 3.33333, 3.333333333333333, 3+1/3, 5, 5, 10, 12)

5 and 5 as well as exp(1) and exp(1)+10^(-55) have exactly the same value (to the level of accuracy of a float number). Some other entry differ largely and some others differ only by a small amount. I would like to consider an approximation to equality test

ApproxEqual = function(a,b) abs(a-b) < epsilon

, where epsilon could be 1e-5 for example.

Goal

I would like to modify the values of the variable x "as little as possible" to ensure that no two values in x are "approximatively equal" and x is still bounded between min and max.

I am happy to let you decide what "as little as possible" really mean. One could for example minimize the sum of square deviations between the original x and the expected variable output.

Example 1

x_input = c(5, 5.1, 5.1, 5.1, 5.2)
min=1
max=100

x_output = c(5, 5.1-epsilon, 5.1, 5.1+epsilon, 5.2)

Example 2

x_input = c(2,2,2,3,3)
min=2
max=3

x_output = c(2, 2+epsilon, 2+2*epsilon, 2+3*epsilon, 3-epsilon,3)

Of course, in the above case if (3-epsilon) - (2+3*epsilon) < epsilon is TRUE, then the function should throw an error as the problem has no solution.

Side Note

I would love if the solution is quite performant. answer could make could use of Rcpp for example.

like image 463
Remi.b Avatar asked May 30 '16 22:05

Remi.b


2 Answers

I doubt this is possible without iterating, because shifting some points away from neighbours that are too close could cause the moved points to bunch up closer to their other neighbours. Here is one solution that only changes those values that are necessary to arrive at a solution, and moves them by the smallest distance it can to ensure a minimum gap of epsilon.

It uses a function that assigns a force to each point depending whether we need to move it away from a neighbour that is too close. Direction (sign) of the force indicates whether we need to increase or decrease the value of that point. Points that are sandwiched between other too-near neighbours do not move, but their outside neighbours both move away from the central point (this behaviour to move as few points as we can by as little as possible). The force assigned to the end points is always zero, because we do not want overall range of x to change

force <- function(x, epsilon){
 c(0, sapply(2:(length(x)-1), function(i){ (x[i] < (x[i-1]+epsilon)) - (x[i] > (x[i+1]-epsilon)) }), 0)
}

Next, we need a function to shift points, depending on the force acting on them. Positive forces cause them to move to epsilon higher than the previous point. Negative forces shift them downwards.

move <- function(x, epsilon, f){
  x[which(f==-1)] <- x[which(f==-1)+1] - epsilon 
  x[which(f==1)]  <- x[which(f==1)-1] + epsilon
  # Next line deals with boundary condition, and prevents points from bunching up at the edges of the range
  # I doubt this is necessary, but included out of abundance of caution. Could try deleting this line if performance is an issue.
  x <- sapply(1:(length(x)), function(i){x[i] <- max(x[i], head(x,1)+(i-1)*epsilon); x[i] <- min(x[i], tail(x,1)-(length(x)-i)*epsilon)})
  x
}

Finally, the function separate is used to iteratively calculate force and move points until a solution is found. It also checks for a couple of edge cases before iterating.

separate <- function(x,epsilon) {
  if (epsilon > (range(x)[2] - range(x)[1]) / (length(x) - 1)) stop("no solution possible")
  if (!(all(diff(x)>=0))) stop ("vector must be sorted, ascending")

  initial.x <- x
  solved <- FALSE

  ##################################
  # A couple of edge cases to catch
  ##################################
  # 1. catch cases when vector length < 3 (nothing to do, as there are no points to move)
  if (length(x)<3) solved <- TRUE
  # 2. catch cases where initial vector has values too close to the boundaries 
  x <- sapply(1:(length(x)), function(i){
    x[i] <- max(x[i], head(x,1)+(i-1)*epsilon)
    x[i] <- min(x[i], tail(x,1)-(length(x)-i)*epsilon)
  })

  # Now iterate to find solution
  it <- 0
  while (!solved) {
    it <-  it+1
    f <- force(x, epsilon)
    if (sum(abs(f)) == 0) solved <- TRUE
    else x <- move(x, epsilon, f)
  }
  list(xhat=x, iterations=it, SSR=sum(abs(x-initial.x)^2))
}

Testing this on the example provided by OP:

x = c(0.012, 1, exp(1), exp(1)+1e-55, exp(1)+1e-10, exp(1)+1e-3, 3.3, 3.33333, 3.333333333333333, 3+1/3, 5, 5, 10, 12)
epsilon <- 1e-5

separate(x, epsilon)
# $xhat
# [1]  0.012000  1.000000  2.718272  2.718282  2.718292  2.719282  3.300000  3.333323  3.333333  3.333343
# [11]  4.999990  5.000000 10.000000 12.000000
#
# $iterations
# [1] 2
#
# $SSR
# [1] 4.444424e-10

Edit 1

Lines were added to function separate in response to comment to catch a couple of edge cases -

A) where vector passed to function has length < 3

separate(c(0,1), 1e-5)
# $xhat
# [1] 0 1
# 
# $iterations
# [1] 0
# 
# $SSR
# [1] 0

B) where passed vector has several values at the boundaries

separate(c(0,0,0,1), 1e-5)
# [1] "it = 1, SSR = 5e-10"
# $xhat
# [1] 0e+00 1e-05 2e-05 1e+00
# 
# $iterations
# [1] 1
#
# $SSR
# [1] 5e-10
like image 180
dww Avatar answered Sep 21 '22 09:09

dww


This was a fun challenge, and I think I've worked out a solution. It's a bit ugly and convoluted and could do with some streamlining, but it seems to return what Remi asked for.

library(magrittr)

xin <- c(0.012, 1, exp(1), exp(1)+10^(-55), exp(1)+10^(-10),
    exp(1)+10^(-3), 3.3, 3.33333, 3.333333333333333, 3+1/3, 5, 5, 10, 12)

tiebreaker <- function(x, t=3) {
    dif <- diff(x) %>% round(t)
    x[dif==0] <- x[dif==0] + 
        seq(-10^-t, -10^-(t+0.99), 
        length.out=length(x[dif==0])) %>% sort
    x
}

xout <- tiebreaker(xin)

diff(xin) > 0.0001
# TRUE TRUE FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE FALSE TRUE TRUE

diff(xout) > 0.0001  #it makes close matches less close
# TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

xin == xout  #but leaves already less close matches as they were
# TRUE TRUE FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE FALSE TRUE TRUE TRUE

EDIT: I wrapped it up into a simple function. tr sets the threshold for what's considered a close match, in decimal points.

like image 33
AkselA Avatar answered Sep 22 '22 09:09

AkselA