I have data similar to the following:
set.seed(1)
dt <- data.table(ID=1:10, Status=c(rep("OUT",2),rep("IN",2),"ON",rep("OUT",2),rep("IN",2),"ON"),
t1=round(rnorm(10),1), t2=round(rnorm(10),1), t3=round(rnorm(10),1),
t4=round(rnorm(10),1), t5=round(rnorm(10),1), t6=round(rnorm(10),1),
t7=round(rnorm(10),1),t8=round(rnorm(10),1))
ID Status t1 t2 t3 t4 t5 t6 t7 t8
1: 1 OUT -0.6 1.5 0.9 1.4 -0.2 0.4 2.4 0.5
2: 2 OUT 0.2 0.4 0.8 -0.1 -0.3 -0.6 0.0 -0.7
3: 3 IN -0.8 -0.6 0.1 0.4 0.7 0.3 0.7 0.6
4: 4 IN 1.6 -2.2 -2.0 -0.1 0.6 -1.1 0.0 -0.9
5: 5 ON 0.3 1.1 0.6 -1.4 -0.7 1.4 -0.7 -1.3
6: 6 OUT -0.8 0.0 -0.1 -0.4 -0.7 2.0 0.2 0.3
7: 7 OUT 0.5 0.0 -0.2 -0.4 0.4 -0.4 -1.8 -0.4
8: 8 IN 0.7 0.9 -1.5 -0.1 0.8 -1.0 1.5 0.0
9: 9 IN 0.6 0.8 -0.5 1.1 -0.1 0.6 0.2 0.1
10: 10 ON -0.3 0.6 0.4 0.8 0.9 -0.1 2.2 -0.6
I need to apply constraints to dt similar to the following (which are read in from a csv using fread):
dt_constraints <- data.table(columns=c("t1","t3","t7","t8"), operator=c(rep(">=",2),rep("<=",2)),
values=c(-.6,-.5,2.4,.5))
columns operator values
1 t1 >= -0.6
2 t3 >= -0.5
3 t7 <= 2.4
4 t8 <= 0.5
I can easily subset dt by typing in the various constraints in the i statement:
dt_sub <- dt[t1>=-.6 & t3 >=-.5 & t7<=2.4 & t8<=.5,]
ID Status t1 t2 t3 t4 t5 t6 t7 t8
1 1 OUT -0.6 1.5 0.9 1.4 -0.2 0.4 2.4 0.5
2 2 OUT 0.2 0.4 0.8 -0.1 -0.3 -0.6 0 -0.7
3 5 ON 0.3 1.1 0.6 -1.4 -0.7 1.4 -0.7 -1.3
4 7 OUT 0.5 0 -0.2 -0.4 0.4 -0.4 -1.8 -0.4
5 9 IN 0.6 0.8-0.5 1.1 -0.1 0.6 0.2 0.1
6 10 ON -0.3 0.6 0.4 0.8 0.9 -0.1 2.2 -0.6
But, since the constraints are constantly changing (a new constrants csv is read in each time), I am looking for an efficient way to programatically apply the constraints directly from dt_constraints to subset dt. The actual data is quite large as is the number of constraints so efficiency is key.
Thanks so much.
There is an alternative approach which uses non-equi joins for subsetting:
thresholds <- dt_constraints[, values]
cond <- dt_constraints[, paste0(columns, operator, "V", .I)]
dt[dt[as.list(thresholds), on = cond, which = TRUE]]
ID Status t1 t2 t3 t4 t5 t6 t7 t8 1: 1 OUT -0.6 1.5 0.9 1.4 -0.2 0.4 2.4 0.5 2: 2 OUT 0.2 0.4 0.8 -0.1 -0.3 -0.6 0.0 -0.7 3: 5 ON 0.3 1.1 0.6 -1.4 -0.7 1.4 -0.7 -1.3 4: 7 OUT 0.5 0.0 -0.2 -0.4 0.4 -0.4 -1.8 -0.4 5: 9 IN 0.6 0.8 -0.5 1.1 -0.1 0.6 0.2 0.1 6: 10 ON -0.3 0.6 0.4 0.8 0.9 -0.1 2.2 -0.6
We can paste it as a single string and then do the eval
dt[eval(parse(text=do.call(paste, c(dt_constraints, collapse= ' & '))))]
# ID Status t1 t2 t3 t4 t5 t6 t7 t8
#1: 1 OUT -0.6 1.5 0.9 1.4 -0.2 0.4 2.4 0.5
#2: 2 OUT 0.2 0.4 0.8 -0.1 -0.3 -0.6 0.0 -0.7
#3: 5 ON 0.3 1.1 0.6 -1.4 -0.7 1.4 -0.7 -1.3
#4: 7 OUT 0.5 0.0 -0.2 -0.4 0.4 -0.4 -1.8 -0.4
#5: 9 IN 0.6 0.8 -0.5 1.1 -0.1 0.6 0.2 0.1
#6: 10 ON -0.3 0.6 0.4 0.8 0.9 -0.1 2.2 -0.6
If we are using tidyverse, then
library(dplyr)
dt %>%
filter(!!rlang::parse_expr(do.call(paste, c(dt_constraints, collapse= ' & '))))
# ID Status t1 t2 t3 t4 t5 t6 t7 t8
#1 1 OUT -0.6 1.5 0.9 1.4 -0.2 0.4 2.4 0.5
#2 2 OUT 0.2 0.4 0.8 -0.1 -0.3 -0.6 0.0 -0.7
#3 5 ON 0.3 1.1 0.6 -1.4 -0.7 1.4 -0.7 -1.3
#4 7 OUT 0.5 0.0 -0.2 -0.4 0.4 -0.4 -1.8 -0.4
#5 9 IN 0.6 0.8 -0.5 1.1 -0.1 0.6 0.2 0.1
#6 10 ON -0.3 0.6 0.4 0.8 0.9 -0.1 2.2 -0.6
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With