I am trying to filter a vector of integers.
My condition is that the distance between 2 consecutive elements should be at least 100 ; if not, remove the element and look at the next candidate.
Here is an example :
set.seed(42)
input <- sort(sample(1:1000, 20))
head(input, 20)
[1] 24 49 74 128 146 153 165 228 303 321 356 410 532 561 601 622 634 839 882 997
If I start from the first element 24
, I would like to keep the first element that has a distance at least 100 from it.
In this case, it would be 128
.
Then, from 128
, repeat the same process.
The result should be :
24 128 228 356 532 634 839 997
I managed to create a quick and dirty loop that gives the correct result, but I can guess that it would not be very efficient for very large vectors...
result <- integer(length(input))
result[1] <- input[1]
for(i in seq_along(input)[-1]) {
if(is.na(input[2])) break
if(input[2] - input[1] < 100) {
input <- input[-2]
} else {
result[i] <- input[2]
input <- input[-1]
}
}
result <- result[result != 0]
What would be an efficient way to get the expected result ? Can it be done using vectorization ?
unique(Reduce(function(x,y)ifelse(y-x>=100,y,x),input,accumulate = T))
[1] 24 128 228 356 532 634 839 997
Not thoroughly tested, but I believe this gets you there. I am using purrr::accumulate
. This is a pretty neat problem :-) hoping to see some other solutions/approaches, so maybe leave this open (unanswered) for a bit...
library(purrr)
input <- c(24, 49, 74, 128, 146, 153, 165, 228, 303, 321, 356, 410, 532, 561, 601, 622, 634, 839, 882, 997)
idx <- which(accumulate(diff(input), ~ ifelse(.x >= 100, .y, .x + .y)) >= 100)
input[c(1, idx + 1)]
#> [1] 24 128 228 356 532 634 839 997
And to make this read a little more purrr
, I suppose we could do:
accumulate(diff(input), ~ if_else(.x >= 100, .y, .x + .y)) %>%
map_lgl(~ . >= 100) %>%
which %>%
{ input[c(1, . + 1)] }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With