I would like to merge 2 data frames based on multiple conditions.
DF1 <- data.frame("col1" = rep(c("A","B"), 18),
"col2" = rep(c("C","D","E"), 12),
"value"= (sample(1:100,36)),
"col4" = rep(NA,36))
DF2 <- data.frame("col1" = rep("A",6),
"col2" = rep(c("C","D"),3),
"data" = rep(c(1,3),3),
"min" = seq(0,59,by=10),
"max" = seq(10,69,by=10))
> DF1
col1 col2 value col4
1 A C 22 NA
2 B D 58 NA
3 A E 35 NA
4 B C 86 NA
5 A D 37 NA
6 B E 16 NA
7 A C 46 NA
8 B D 23 NA
9 A E 88 NA
10 B C 3 NA
11 A D 33 NA
12 B E 25 NA
13 A C 19 NA
14 B D 24 NA
15 A E 9 NA
16 B C 76 NA
17 A D 62 NA
18 B E 68 NA
19 A C 97 NA
20 B D 43 NA
21 A E 8 NA
22 B C 84 NA
23 A D 36 NA
24 B E 20 NA
25 A C 57 NA
26 B D 99 NA
27 A E 42 NA
28 B C 64 NA
29 A D 87 NA
30 B E 1 NA
31 A C 78 NA
32 B D 34 NA
33 A E 41 NA
34 B C 32 NA
35 A D 10 NA
36 B E 72 NA
> DF2
col1 col2 data min max
1 A C 1 0 10
2 A D 3 10 20
3 A C 1 20 30
4 A D 3 30 40
5 A C 1 40 50
6 A D 3 50 60
DF1 is the main table and DF2 is treated as a lookup table
If col1 and col2 of DF1 match that of DF2, and 'value' of DF1 is in between min and max of DF2, then column 'data' from DF2 will be added to DF1. If the conditions are not met, 'data' of DF1 will have value of NA.
Expected output (first 6 rows):
col1 col2 value col4 data
1 A C 22 NA 1
2 B D 58 NA NA
3 A E 35 NA NA
4 B C 86 NA NA
5 A D 37 NA 3
6 B E 16 NA NA
I've tried using merge (to match col1 snd col2) then subset (to filter only rows that have value in between min and max) , but my goal is to maintain all the rows of DF1.
Anyone has an idea on this ?
The merge() function in base R can be used to merge input dataframes by common columns or row names. The merge() function retains all the row names of the dataframes, behaving similarly to the inner join. The dataframes are combined in order of the appearance in the input function call.
The data frame rows can be subjected to multiple conditions by combining them using logical operators, like AND (&) , OR (|).
You can merge columns, by adding new variables; or you can merge rows, by adding observations. To add columns use the function merge() which requires that datasets you will merge to have a common variable. In case that datasets doesn't have a common variable use the function cbind .
With the recent versions of data.table
, non-equi joins and update on join are possible:
library(data.table)
head(setDT(DF1)[setDT(DF2), on = c("col1", "col2", "value>=min", "value<=max"),
data := data])
rn col1 col2 value col4 data 1: 1 A C 22 NA 1 2: 2 B D 58 NA NA 3: 3 A E 35 NA NA 4: 4 B C 86 NA NA 5: 5 A D 37 NA 3 6: 6 B E 16 NA NA
DF1 <- structure(list(rn = 1:36, col1 = c("A", "B", "A", "B", "A", "B",
"A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A",
"B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B",
"A", "B", "A", "B"), col2 = c("C", "D", "E", "C", "D", "E", "C",
"D", "E", "C", "D", "E", "C", "D", "E", "C", "D", "E", "C", "D",
"E", "C", "D", "E", "C", "D", "E", "C", "D", "E", "C", "D", "E",
"C", "D", "E"), value = c(22L, 58L, 35L, 86L, 37L, 16L, 46L,
23L, 88L, 3L, 33L, 25L, 19L, 24L, 9L, 76L, 62L, 68L, 97L, 43L,
8L, 84L, 36L, 20L, 57L, 99L, 42L, 64L, 87L, 1L, 78L, 34L, 41L,
32L, 10L, 72L), col4 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), .Names = c("rn",
"col1", "col2", "value", "col4"), row.names = c(NA, -36L), class = "data.frame")
DF2 <- structure(list(rn = 1:6, col1 = c("A", "A", "A", "A", "A", "A"
), col2 = c("C", "D", "C", "D", "C", "D"), data = c(1L, 3L, 1L,
3L, 1L, 3L), min = c(0L, 10L, 20L, 30L, 40L, 50L), max = c(10L,
20L, 30L, 40L, 50L, 60L)), .Names = c("rn", "col1", "col2", "data",
"min", "max"), row.names = c(NA, -6L), class = "data.frame")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With