Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filtering a vector on condition

Tags:

r

filter

vector

I am trying to filter a vector of integers.
My condition is that the distance between 2 consecutive elements should be at least 100 ; if not, remove the element and look at the next candidate.
Here is an example :

set.seed(42)
input <- sort(sample(1:1000, 20))
head(input, 20)


[1] 24  49  74 128 146 153 165 228 303 321 356 410 532 561 601 622 634 839 882 997

If I start from the first element 24, I would like to keep the first element that has a distance at least 100 from it.
In this case, it would be 128.

Then, from 128, repeat the same process.
The result should be :

24 128 228 356 532 634 839 997

I managed to create a quick and dirty loop that gives the correct result, but I can guess that it would not be very efficient for very large vectors...

result <- integer(length(input))
result[1] <- input[1]
for(i in seq_along(input)[-1]) {
  if(is.na(input[2])) break

  if(input[2] - input[1] < 100) {
    input <- input[-2]
  } else {
    result[i] <- input[2]
    input <- input[-1]
  }
}

result <- result[result != 0]

What would be an efficient way to get the expected result ? Can it be done using vectorization ?

like image 773
i94pxoe Avatar asked Jun 07 '19 18:06

i94pxoe


2 Answers

unique(Reduce(function(x,y)ifelse(y-x>=100,y,x),input,accumulate = T))
[1]  24 128 228 356 532 634 839 997
like image 198
KU99 Avatar answered Oct 18 '22 12:10

KU99


Not thoroughly tested, but I believe this gets you there. I am using purrr::accumulate. This is a pretty neat problem :-) hoping to see some other solutions/approaches, so maybe leave this open (unanswered) for a bit...

library(purrr)

input <- c(24, 49, 74, 128, 146, 153, 165, 228, 303, 321, 356, 410, 532, 561, 601, 622, 634, 839, 882, 997)
idx <- which(accumulate(diff(input), ~ ifelse(.x >= 100, .y, .x + .y)) >= 100)
input[c(1, idx + 1)]
#> [1]  24 128 228 356 532 634 839 997

And to make this read a little more purrr, I suppose we could do:

accumulate(diff(input), ~ if_else(.x >= 100, .y, .x + .y)) %>%
  map_lgl(~ . >= 100) %>%
  which %>%
  { input[c(1, . + 1)] }
like image 27
JasonAizkalns Avatar answered Oct 18 '22 12:10

JasonAizkalns