Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Round to nearest arbitrary number from list

Tags:

r

I'm basically looking for a way to do a variation of this Ruby script in R.
I have an arbitrary list of numbers (steps of a moderator for a regression plot in this case) which have unequal distances from each other, and I'd like to round values which are within a range around these numbers to the nearest number in the list. The ranges don't overlap.

arbitrary.numbers <- c(4,10,15) / 10
numbers <- c(16:1 / 10, 0.39, 1.45)
range <- 0.1

Expected output:

numbers
## 1.6 1.5 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.39 1.45
round_to_nearest_neighbour_in_range(numbers,arbitrary.numbers,range)
## 1.5 1.5 1.5 1.3 1.2 1.0 1.0 1.0 0.8 0.7 0.6 0.4 0.4 0.4 0.2 0.1 0.4 1.5

I've got a little helper function that might do for my specific problem, but it's not very flexible and it contains a loop. I can post it here, but I think a real solution would look completely different.

The different answers timed for speed (on a million numbers)

> numbers = rep(numbers,length.out = 1000000)
> system.time({ mvg.round(numbers,arbitrary.numbers,range) })[3]
elapsed 
  0.067 
> system.time({ rinker.loop.round(numbers,arbitrary.numbers,range) })[3]
elapsed 
  0.289 
> system.time({ rinker.round(numbers,arbitrary.numbers,range) })[3]
elapsed 
  1.403 
> system.time({ nograpes.round(numbers,arbitrary.numbers,range) })[3]
elapsed 
  1.971 
> system.time({ january.round(numbers,arbitrary.numbers,range) })[3]
elapsed 
  16.12 
> system.time({ shariff.round(numbers,arbitrary.numbers,range) })[3]
elapsed 
15.833 
> system.time({ mplourde.round(numbers,arbitrary.numbers,range) })[3]
elapsed 
  9.613 
> system.time({ kohske.round(numbers,arbitrary.numbers,range) })[3]
elapsed 
 26.274 

MvG's function is the fastest, about 5 times faster than Tyler Rinker's second function.

like image 321
Ruben Avatar asked Oct 12 '12 14:10

Ruben


6 Answers

A vectorized solution, without any apply family functions or loops:

The key is findInterval, which finds the "space" in arbitrary.numbers where each element in numbers is "between". So, findInterval(6,c(2,4,7,8)) returns 2, because 6 is between the 2nd and 3rd index of c(2,4,7,8).

# arbitrary.numbers is assumed to be sorted.
# find the index of the number just below each number, and just above.
# So for 6 in c(2,4,7,8) we would find 2 and 3.
low<-findInterval(numbers,arbitrary.numbers) # find index of number just below
high<-low+1 # find the corresponding index just above.

# Find the actual absolute difference between the arbitrary number above and below.
# So for 6 in c(2,4,7,8) we would find 2 and 1. 
# (The absolute differences to 4 and 7).
low.diff<-numbers-arbitrary.numbers[ifelse(low==0,NA,low)]
high.diff<-arbitrary.numbers[ifelse(high==0,NA,high)]-numbers

# Find the minimum difference. 
# In the example we would find that 6 is closest to 7, 
# because the difference is 1.
mins<-pmin(low.diff,high.diff,na.rm=T) 
# For each number, pick the arbitrary number with the minimum difference.
# So for 6 pick out 7.
pick<-ifelse(!is.na(low.diff) & mins==low.diff,low,high)

# Compare the actual minimum difference to the range. 
ifelse(mins<=range+.Machine$double.eps,arbitrary.numbers[pick],numbers)
# [1] 1.5 1.5 1.5 1.3 1.2 1.0 1.0 1.0 0.8 0.7 0.6 0.4 0.4 0.4 0.2 0.1 0.4 1.5
like image 134
nograpes Avatar answered Oct 01 '22 01:10

nograpes


Yet another solution using findInterval:

arbitrary.numbers<-sort(arbitrary.numbers)          # need them sorted
range <- range*1.000001                             # avoid rounding issues
nearest <- findInterval(numbers, arbitrary.numbers - range) # index of nearest
nearest <- c(-Inf, arbitrary.numbers)[nearest + 1]  # value of nearest
diff <- numbers - nearest                           # compute errors
snap <- diff <= range                               # only snap near numbers
numbers[snap] <- nearest[snap]                      # snap values to nearest
print(numbers)

The nearest in the above code is not really mathematically the nearest number. Instead, it is the largest arbitrary number such that nearest[i] - range <= numbers[i], or equivalently nearest[i] <= numbers[i] + range. So in one go we find the largest arbitrary number which is either in the snapping range for a given input number, or still too small for that. For this reason, we only need to check one way for snap. No absolute value required, and even the squaring from a previous revision of this post was unneccessary.

Thanks to Interval search on a data frame for the pointer at findInterval, as I found it there before recognizing it in the answer by nograpes.

If, in contrast to your original question, you had overlapping ranges, you could write things like this:

arbitrary.numbers<-sort(arbitrary.numbers)        # need them sorted
range <- range*1.000001                           # avoid rounding issues
nearest <- findInterval(numbers, arbitrary.numbers) + 1 # index of interval
hi <- c(arbitrary.numbers, Inf)[nearest]          # next larger
nearest <- c(-Inf, arbitrary.numbers)[nearest]    # next smaller
takehi <- (hi - numbers) < (numbers - nearest)    # larger better than smaller
nearest[takehi] <- hi[takehi]                     # now nearest is really nearest
snap <- abs(nearest - numbers) <= range           # only snap near numbers
numbers[snap] <- nearest[snap]                    # snap values to nearest
print(numbers)

In this code, nearestreally ends up being the nearest number. This is achieved by considering both endpoints of every interval. In spirit, this is very much like the version by nograpes, but it avoids using ifelse and NA, which should benefit performance as it reduces the number of branching instructions.

like image 35
MvG Avatar answered Oct 01 '22 01:10

MvG


Is this what you want?

> idx <- abs(outer(arbitrary.numbers, numbers, `-`)) <= (range+.Machine$double.eps)
> rounded <- arbitrary.numbers[apply(rbind(idx, colSums(idx) == 0), 2, which)]
> ifelse(is.na(rounded), numbers, rounded)
 [1] 1.5 1.5 1.5 1.3 1.2 1.0 1.0 1.0 0.8 0.7 0.6 0.4 0.4 0.4 0.2 0.1 0.4 1.5
like image 37
kohske Avatar answered Oct 01 '22 01:10

kohske


Please note that due to rounding errors (most likely), I use range = 0.1000001 to achieve the expected effect.

range <- range + 0.0000001

blah <- rbind( numbers, sapply( numbers, function( x ) abs( x - arbitrary.numbers ) ) )
ff <- function( y ) { if( min( y[-1] ) <= range + 0.000001 ) arbitrary.numbers[ which.min( y[ -1 ] ) ] else  y[1]  }
apply( blah, 2, ff )
like image 34
January Avatar answered Oct 01 '22 01:10

January


Another option:

arb.round <- function(numbers, arbitrary.numbers, range) {
    arrnd <- function(x, ns, r){ 
        ifelse(abs(x - ns) <= range +.00000001, ns, x)
    }
    lapply(1:length(arbitrary.numbers), function(i){
            numbers <<- arrnd(numbers, arbitrary.numbers[i], range)
        }
    )
    numbers
}

arb.round(numbers, arbitrary.numbers, range)

Yields:

> arb.round(numbers, arbitrary.numbers, range)
[1] 1.5 1.5 1.5 1.3 1.2 1.0 1.0 1.0 0.8 0.7 0.6 0.4 0.4 0.4 0.2 0.1 0.4 1.5

EDIT: I removed the return call at the end of the function as it's not necessary adn can burn time.

EDIT: I think a loop will be even faster here:

loop.round <- function(numbers, arbitrary.numbers, range) {
    arrnd <- function(x, ns, r){ 
        ifelse(abs(x - ns) <= range +.00000001, ns, x)
    }
    for(i in seq_along(arbitrary.numbers)){
            numbers <- arrnd(numbers, arbitrary.numbers[i], range)
    }
    numbers
}
like image 24
Tyler Rinker Avatar answered Oct 01 '22 03:10

Tyler Rinker


This is still shorter:

sapply(numbers, function(x) ifelse(min(abs(arbitrary.numbers - x)) > 
range + .Machine$double.eps, x, arbitrary.numbers[which.min
(abs(arbitrary.numbers - x))] ))

Thanks @MvG

like image 43
Ali Avatar answered Oct 01 '22 03:10

Ali