Subset using i statement dynamically created from another data.table's variables

Question

I have data similar to the following:

set.seed(1)
dt <- data.table(ID=1:10, Status=c(rep("OUT",2),rep("IN",2),"ON",rep("OUT",2),rep("IN",2),"ON"), 
                 t1=round(rnorm(10),1), t2=round(rnorm(10),1), t3=round(rnorm(10),1), 
                 t4=round(rnorm(10),1), t5=round(rnorm(10),1), t6=round(rnorm(10),1),
                 t7=round(rnorm(10),1),t8=round(rnorm(10),1))

    ID Status   t1   t2   t3   t4   t5   t6   t7   t8
 1:  1    OUT -0.6  1.5  0.9  1.4 -0.2  0.4  2.4  0.5
 2:  2    OUT  0.2  0.4  0.8 -0.1 -0.3 -0.6  0.0 -0.7
 3:  3     IN -0.8 -0.6  0.1  0.4  0.7  0.3  0.7  0.6
 4:  4     IN  1.6 -2.2 -2.0 -0.1  0.6 -1.1  0.0 -0.9
 5:  5     ON  0.3  1.1  0.6 -1.4 -0.7  1.4 -0.7 -1.3
 6:  6    OUT -0.8  0.0 -0.1 -0.4 -0.7  2.0  0.2  0.3
 7:  7    OUT  0.5  0.0 -0.2 -0.4  0.4 -0.4 -1.8 -0.4
 8:  8     IN  0.7  0.9 -1.5 -0.1  0.8 -1.0  1.5  0.0
 9:  9     IN  0.6  0.8 -0.5  1.1 -0.1  0.6  0.2  0.1
10: 10     ON -0.3  0.6  0.4  0.8  0.9 -0.1  2.2 -0.6

I need to apply constraints to dt similar to the following (which are read in from a csv using fread):

dt_constraints <- data.table(columns=c("t1","t3","t7","t8"), operator=c(rep(">=",2),rep("<=",2)), 
                         values=c(-.6,-.5,2.4,.5))

    columns operator    values
1   t1       >=         -0.6
2   t3       >=         -0.5
3   t7       <=          2.4
4   t8       <=          0.5

I can easily subset dt by typing in the various constraints in the i statement:

dt_sub <- dt[t1>=-.6 & t3 >=-.5 & t7<=2.4 & t8<=.5,] 

    ID  Status  t1  t2  t3   t4  t5   t6    t7   t8
1   1   OUT   -0.6  1.5 0.9  1.4 -0.2  0.4  2.4  0.5
2   2   OUT    0.2  0.4 0.8 -0.1 -0.3 -0.6  0   -0.7
3   5   ON     0.3  1.1 0.6 -1.4 -0.7  1.4 -0.7 -1.3
4   7   OUT    0.5  0  -0.2 -0.4  0.4 -0.4 -1.8 -0.4
5   9   IN     0.6  0.8-0.5  1.1 -0.1  0.6  0.2  0.1
6   10  ON    -0.3  0.6 0.4  0.8  0.9 -0.1  2.2 -0.6

But, since the constraints are constantly changing (a new constrants csv is read in each time), I am looking for an efficient way to programatically apply the constraints directly from dt_constraints to subset dt. The actual data is quite large as is the number of constraints so efficiency is key.

Thanks so much.

Uwe · Accepted Answer

There is an alternative approach which uses non-equi joins for subsetting:

thresholds <- dt_constraints[, values]
cond <- dt_constraints[, paste0(columns, operator, "V", .I)]
dt[dt[as.list(thresholds), on = cond, which = TRUE]]

   ID Status   t1  t2   t3   t4   t5   t6   t7   t8
1:  1    OUT -0.6 1.5  0.9  1.4 -0.2  0.4  2.4  0.5
2:  2    OUT  0.2 0.4  0.8 -0.1 -0.3 -0.6  0.0 -0.7
3:  5     ON  0.3 1.1  0.6 -1.4 -0.7  1.4 -0.7 -1.3
4:  7    OUT  0.5 0.0 -0.2 -0.4  0.4 -0.4 -1.8 -0.4
5:  9     IN  0.6 0.8 -0.5  1.1 -0.1  0.6  0.2  0.1
6: 10     ON -0.3 0.6  0.4  0.8  0.9 -0.1  2.2 -0.6

akrun · Answer

We can paste it as a single string and then do the eval

dt[eval(parse(text=do.call(paste, c(dt_constraints, collapse= ' & '))))]
#   ID Status   t1  t2   t3   t4   t5   t6   t7   t8
#1:  1    OUT -0.6 1.5  0.9  1.4 -0.2  0.4  2.4  0.5
#2:  2    OUT  0.2 0.4  0.8 -0.1 -0.3 -0.6  0.0 -0.7
#3:  5     ON  0.3 1.1  0.6 -1.4 -0.7  1.4 -0.7 -1.3
#4:  7    OUT  0.5 0.0 -0.2 -0.4  0.4 -0.4 -1.8 -0.4
#5:  9     IN  0.6 0.8 -0.5  1.1 -0.1  0.6  0.2  0.1
#6: 10     ON -0.3 0.6  0.4  0.8  0.9 -0.1  2.2 -0.6

If we are using tidyverse, then

library(dplyr)
dt %>% 
   filter(!!rlang::parse_expr(do.call(paste, c(dt_constraints, collapse= ' & '))))
#  ID Status   t1  t2   t3   t4   t5   t6   t7   t8
#1  1    OUT -0.6 1.5  0.9  1.4 -0.2  0.4  2.4  0.5
#2  2    OUT  0.2 0.4  0.8 -0.1 -0.3 -0.6  0.0 -0.7
#3  5     ON  0.3 1.1  0.6 -1.4 -0.7  1.4 -0.7 -1.3
#4  7    OUT  0.5 0.0 -0.2 -0.4  0.4 -0.4 -1.8 -0.4
#5  9     IN  0.6 0.8 -0.5  1.1 -0.1  0.6  0.2  0.1
#6 10     ON -0.3 0.6  0.4  0.8  0.9 -0.1  2.2 -0.6

Subset using i statement dynamically created from another data.table's variables

Tags:

r

data.table

FG7

2 Answers

Uwe

akrun

Recent Activity

Donate For Us

Subset using i statement dynamically created from another data.table's variables

Tags:

r

data.table

FG7

2 Answers

Uwe

akrun

Related questions

Recent Activity

Donate For Us