I'm basically looking for a way to do a variation of this Ruby script in R.
I have an arbitrary list of numbers (steps of a moderator for a regression plot in this case) which have unequal distances from each other, and I'd like to round values which are within a range around these numbers to the nearest number in the list.
The ranges don't overlap.
arbitrary.numbers <- c(4,10,15) / 10
numbers <- c(16:1 / 10, 0.39, 1.45)
range <- 0.1
Expected output:
numbers
## 1.6 1.5 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.39 1.45
round_to_nearest_neighbour_in_range(numbers,arbitrary.numbers,range)
## 1.5 1.5 1.5 1.3 1.2 1.0 1.0 1.0 0.8 0.7 0.6 0.4 0.4 0.4 0.2 0.1 0.4 1.5
I've got a little helper function that might do for my specific problem, but it's not very flexible and it contains a loop. I can post it here, but I think a real solution would look completely different.
> numbers = rep(numbers,length.out = 1000000)
> system.time({ mvg.round(numbers,arbitrary.numbers,range) })[3]
elapsed
0.067
> system.time({ rinker.loop.round(numbers,arbitrary.numbers,range) })[3]
elapsed
0.289
> system.time({ rinker.round(numbers,arbitrary.numbers,range) })[3]
elapsed
1.403
> system.time({ nograpes.round(numbers,arbitrary.numbers,range) })[3]
elapsed
1.971
> system.time({ january.round(numbers,arbitrary.numbers,range) })[3]
elapsed
16.12
> system.time({ shariff.round(numbers,arbitrary.numbers,range) })[3]
elapsed
15.833
> system.time({ mplourde.round(numbers,arbitrary.numbers,range) })[3]
elapsed
9.613
> system.time({ kohske.round(numbers,arbitrary.numbers,range) })[3]
elapsed
26.274
MvG's function is the fastest, about 5 times faster than Tyler Rinker's second function.
A vectorized solution, without any apply
family functions or loops:
The key is findInterval
, which finds the "space" in arbitrary.numbers
where each element in numbers
is "between". So, findInterval(6,c(2,4,7,8))
returns 2
, because 6
is between the 2nd and 3rd index of c(2,4,7,8)
.
# arbitrary.numbers is assumed to be sorted.
# find the index of the number just below each number, and just above.
# So for 6 in c(2,4,7,8) we would find 2 and 3.
low<-findInterval(numbers,arbitrary.numbers) # find index of number just below
high<-low+1 # find the corresponding index just above.
# Find the actual absolute difference between the arbitrary number above and below.
# So for 6 in c(2,4,7,8) we would find 2 and 1.
# (The absolute differences to 4 and 7).
low.diff<-numbers-arbitrary.numbers[ifelse(low==0,NA,low)]
high.diff<-arbitrary.numbers[ifelse(high==0,NA,high)]-numbers
# Find the minimum difference.
# In the example we would find that 6 is closest to 7,
# because the difference is 1.
mins<-pmin(low.diff,high.diff,na.rm=T)
# For each number, pick the arbitrary number with the minimum difference.
# So for 6 pick out 7.
pick<-ifelse(!is.na(low.diff) & mins==low.diff,low,high)
# Compare the actual minimum difference to the range.
ifelse(mins<=range+.Machine$double.eps,arbitrary.numbers[pick],numbers)
# [1] 1.5 1.5 1.5 1.3 1.2 1.0 1.0 1.0 0.8 0.7 0.6 0.4 0.4 0.4 0.2 0.1 0.4 1.5
Yet another solution using findInterval
:
arbitrary.numbers<-sort(arbitrary.numbers) # need them sorted
range <- range*1.000001 # avoid rounding issues
nearest <- findInterval(numbers, arbitrary.numbers - range) # index of nearest
nearest <- c(-Inf, arbitrary.numbers)[nearest + 1] # value of nearest
diff <- numbers - nearest # compute errors
snap <- diff <= range # only snap near numbers
numbers[snap] <- nearest[snap] # snap values to nearest
print(numbers)
The nearest
in the above code is not really mathematically the nearest number. Instead, it is the largest arbitrary number such that nearest[i] - range <= numbers[i]
, or equivalently nearest[i] <= numbers[i] + range
. So in one go we find the largest arbitrary number which is either in the snapping range for a given input number, or still too small for that. For this reason, we only need to check one way for snap
. No absolute value required, and even the squaring from a previous revision of this post was unneccessary.
Thanks to Interval search on a data frame for the pointer at findInterval
, as I found it there before recognizing it in the answer by nograpes.
If, in contrast to your original question, you had overlapping ranges, you could write things like this:
arbitrary.numbers<-sort(arbitrary.numbers) # need them sorted
range <- range*1.000001 # avoid rounding issues
nearest <- findInterval(numbers, arbitrary.numbers) + 1 # index of interval
hi <- c(arbitrary.numbers, Inf)[nearest] # next larger
nearest <- c(-Inf, arbitrary.numbers)[nearest] # next smaller
takehi <- (hi - numbers) < (numbers - nearest) # larger better than smaller
nearest[takehi] <- hi[takehi] # now nearest is really nearest
snap <- abs(nearest - numbers) <= range # only snap near numbers
numbers[snap] <- nearest[snap] # snap values to nearest
print(numbers)
In this code, nearest
really ends up being the nearest number. This is achieved by considering both endpoints of every interval. In spirit, this is very much like the version by nograpes, but it avoids using ifelse
and NA
, which should benefit performance as it reduces the number of branching instructions.
Is this what you want?
> idx <- abs(outer(arbitrary.numbers, numbers, `-`)) <= (range+.Machine$double.eps)
> rounded <- arbitrary.numbers[apply(rbind(idx, colSums(idx) == 0), 2, which)]
> ifelse(is.na(rounded), numbers, rounded)
[1] 1.5 1.5 1.5 1.3 1.2 1.0 1.0 1.0 0.8 0.7 0.6 0.4 0.4 0.4 0.2 0.1 0.4 1.5
Please note that due to rounding errors (most likely), I use range = 0.1000001 to achieve the expected effect.
range <- range + 0.0000001
blah <- rbind( numbers, sapply( numbers, function( x ) abs( x - arbitrary.numbers ) ) )
ff <- function( y ) { if( min( y[-1] ) <= range + 0.000001 ) arbitrary.numbers[ which.min( y[ -1 ] ) ] else y[1] }
apply( blah, 2, ff )
Another option:
arb.round <- function(numbers, arbitrary.numbers, range) {
arrnd <- function(x, ns, r){
ifelse(abs(x - ns) <= range +.00000001, ns, x)
}
lapply(1:length(arbitrary.numbers), function(i){
numbers <<- arrnd(numbers, arbitrary.numbers[i], range)
}
)
numbers
}
arb.round(numbers, arbitrary.numbers, range)
Yields:
> arb.round(numbers, arbitrary.numbers, range)
[1] 1.5 1.5 1.5 1.3 1.2 1.0 1.0 1.0 0.8 0.7 0.6 0.4 0.4 0.4 0.2 0.1 0.4 1.5
EDIT: I removed the return call at the end of the function as it's not necessary adn can burn time.
EDIT: I think a loop will be even faster here:
loop.round <- function(numbers, arbitrary.numbers, range) {
arrnd <- function(x, ns, r){
ifelse(abs(x - ns) <= range +.00000001, ns, x)
}
for(i in seq_along(arbitrary.numbers)){
numbers <- arrnd(numbers, arbitrary.numbers[i], range)
}
numbers
}
This is still shorter:
sapply(numbers, function(x) ifelse(min(abs(arbitrary.numbers - x)) >
range + .Machine$double.eps, x, arbitrary.numbers[which.min
(abs(arbitrary.numbers - x))] ))
Thanks @MvG
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With