Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter by ranges supplied by two vectors, without a join operation

I wish to do exactly this: Take dates from one dataframe and filter data in another dataframe - R

except without joining, as I am afraid that after I join my data the result will be too big to fit in memory, prior to the filter.

Here is sample data:

tmp_df <- data.frame(a = 1:10)

I wish to do an operation that looks like this:

lower_bound <- c(2, 4)
upper_bound <- c(2, 5)
tmp_df %>%
    filter(a >= lower_bound & a <= upper_bound) # does not work as <= is vectorised inappropriately

and my desired result is:

> tmp_df[(tmp_df$a <= 2 & tmp_df$a >= 2) | (tmp_df$a <= 5 & tmp_df$a >= 4), , drop = F] 
# one way to get indices to subset data frame, impractical for a long range vector
  a
2 2
4 4
5 5

My problem with memory requirements (with respect to the join solution linked) is when tmp_df has many more rows and the lower_bound and upper_bound vectors have many more entries. A dplyr solution, or a solution that can be part of pipe is preferred.

like image 953
Alex Avatar asked Jun 19 '17 03:06

Alex


2 Answers

Maybe you could borrow the inrange function from data.table, which

checks whether each value in x is in between any of the intervals provided in lower,upper.

Usage:

inrange(x, lower, upper, incbounds=TRUE)

library(dplyr); library(data.table)

tmp_df %>% filter(inrange(a, c(2,4), c(2,5)))
#  a
#1 2
#2 4
#3 5
like image 60
Psidom Avatar answered Sep 21 '22 20:09

Psidom


If you'd like to stick with dplyr it has similar functionality provided through the between function.

# ranges I want to check between
my_ranges <- list(c(2,2), c(4,5), c(6,7))

tmp_df <- data.frame(a=1:10)
tmp_df %>% 
  filter(apply(bind_rows(lapply(my_ranges, 
                                FUN=function(x, a){
                                  data.frame(t(between(a, x[1], x[2])))
                                  }, a)
                         ), 2, any))
  a
1 2
2 4
3 5
4 6
5 7

Just be aware that the argument boundaries are included by default and that cannot be changed as with inrange

like image 22
Steven M. Mortimer Avatar answered Sep 19 '22 20:09

Steven M. Mortimer