I have 2 data.tables:
library(data.table)
dt1 <- data.table(id = 1:5, value1 = 11:15, value2 = 21:25, value3 = 36:40)
dt2 <- data.table(name = c("value1", "value1", "value1", "value1",
"value2", "value2", "value2", "value3", "value3"),
valueMin = c(10, 13, 14, 18, 21, 24, 25, 36, 38),
valueMax = c(13, 14, 18, 20, 24, 25, 27, 38, 42),
label = c(101:104, 201:203, 301:302))
> dt1
id value1 value2 value3
1: 1 11 21 36
2: 2 12 22 37
3: 3 13 23 38
4: 4 14 24 39
5: 5 15 25 40
> dt2
name valueMin valueMax label
1: value1 10 13 101
2: value1 13 14 102
3: value1 14 18 103
4: value1 18 20 104
5: value2 21 24 201
6: value2 24 25 202
7: value2 25 27 203
8: value3 36 38 301
9: value3 38 42 302
The result I expect is the following: joining label from dt2 to dt1 by the fact that value1 in dt1 is between valueMin and valueMax in dt2 and dt2$name matches to value1).
Here is a solution I have (gives correct result):
varName <- "value1"
dt2_temp <- dt2[name == varName,]
dt1[dt2_temp, on = .(value1 > valueMin, value1 <= valueMax), nomatch = 0] %>%
select(id, label)
id label
1: 1 101
2: 2 101
3: 3 101
4: 4 102
5: 5 103
I would like to do the same (get label columns) for all the rest columns (value2, value3) in dt1 (using loop), therefore need to replace reference to column name value1 in join to it's name stored in varName, something like:
dt1[dt2_temp, on = .(varName > valueMin, varName <= valueMax), nomatch = 0]
Unfortunately, I did not succeed using: simply varName, eval(varName), as.name(varName). Do you have an idea how to solve this?
Error message is similar to:
Error in `[.data.table`(dt1, dt2_temp, on = .(varName > valueMin, varName <= valueMax), : Column(s) [varName,varName] not found in x
Posting another method that programmatically constructs the on string (see the on argument in ?data.table)
dt1[dt2_temp,
on=c(paste0(varName, ">valueMin"), paste0(varName, "<=valueMax")),
nomatch=0L]
Note that there should not be any space around the variable names.
Why not do it all in one go without a loop?
A possible solution:
melt(dt1, id = 1)[dt2, on = .(variable = name, value > valueMin, value <= valueMax), lbl := i.label
][, dcast(.SD, id ~ variable, value.var = c("value","lbl"))]
which gives:
id value_value1 value_value2 value_value3 lbl_value1 lbl_value2 lbl_value3 1: 1 11 21 36 101 NA NA 2: 2 12 22 37 101 201 301 3: 3 13 23 38 101 201 301 4: 4 14 24 39 102 201 302 5: 5 15 25 40 103 202 302
melt(dt1,1)[dt2, on = .(value> valueMin, value <= valueMax,variable=name), nomatch = 0]
id variable value value.1 label
1: 1 value1 10 13 101
2: 2 value1 10 13 101
3: 3 value1 10 13 101
4: 4 value1 13 14 102
5: 5 value1 14 18 103
6: 2 value2 21 24 201
7: 3 value2 21 24 201
8: 4 value2 21 24 201
9: 5 value2 24 25 202
10: 2 value3 36 38 301
11: 3 value3 36 38 301
12: 4 value3 38 42 302
13: 5 value3 38 42 302
One of the approach could be
library(data.table)
dcast(dt2[melt(dt1, id.vars = 1), #left join of long form of dt1 and original dt2
.( id, variable, label), #only keep concerned columns from merged table
on = .(name = variable, valueMax >= value, valueMin < value)], #join conditions
id ~ variable,
value.var = "label") #long to wide format using dcast to get the final result
which gives
id value1 value2 value3
1: 1 101 NA NA
2: 2 101 201 301
3: 3 101 201 301
4: 4 102 201 302
5: 5 103 202 302
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With