R: merge based on multiple conditions (with non-equal criteria)

Tags:

I would like to merge 2 data frames based on multiple conditions.

DF1 <- data.frame("col1" = rep(c("A","B"), 18),
                  "col2" = rep(c("C","D","E"), 12),
                  "value"= (sample(1:100,36)),
                  "col4" = rep(NA,36))

DF2 <- data.frame("col1" = rep("A",6),
                  "col2" = rep(c("C","D"),3),
                  "data" = rep(c(1,3),3),
                  "min" = seq(0,59,by=10),
                  "max" = seq(10,69,by=10))


> DF1
   col1 col2 value col4
1     A    C    22   NA
2     B    D    58   NA
3     A    E    35   NA
4     B    C    86   NA
5     A    D    37   NA
6     B    E    16   NA
7     A    C    46   NA
8     B    D    23   NA
9     A    E    88   NA
10    B    C     3   NA
11    A    D    33   NA
12    B    E    25   NA
13    A    C    19   NA
14    B    D    24   NA
15    A    E     9   NA
16    B    C    76   NA
17    A    D    62   NA
18    B    E    68   NA
19    A    C    97   NA
20    B    D    43   NA
21    A    E     8   NA
22    B    C    84   NA
23    A    D    36   NA
24    B    E    20   NA
25    A    C    57   NA
26    B    D    99   NA
27    A    E    42   NA
28    B    C    64   NA
29    A    D    87   NA
30    B    E     1   NA
31    A    C    78   NA
32    B    D    34   NA
33    A    E    41   NA
34    B    C    32   NA
35    A    D    10   NA
36    B    E    72   NA

> DF2
  col1 col2 data min max
1    A    C    1   0  10
2    A    D    3  10  20
3    A    C    1  20  30
4    A    D    3  30  40
5    A    C    1  40  50
6    A    D    3  50  60

DF1 is the main table and DF2 is treated as a lookup table

If col1 and col2 of DF1 match that of DF2, and 'value' of DF1 is in between min and max of DF2, then column 'data' from DF2 will be added to DF1. If the conditions are not met, 'data' of DF1 will have value of NA.

Expected output (first 6 rows):

  col1 col2 value col4 data
1    A    C    22   NA    1
2    B    D    58   NA   NA
3    A    E    35   NA   NA
4    B    C    86   NA   NA
5    A    D    37   NA    3
6    B    E    16   NA   NA

I've tried using merge (to match col1 snd col2) then subset (to filter only rows that have value in between min and max) , but my goal is to maintain all the rows of DF1.

Anyone has an idea on this ?

276

asked Jul 27 '17 12:07

Tara Sutjarittham

1 Answers

With the recent versions of data.table, non-equi joins and update on join are possible:

library(data.table)
head(setDT(DF1)[setDT(DF2), on = c("col1", "col2", "value>=min", "value<=max"), 
                data := data])

   rn col1 col2 value col4 data
1:  1    A    C    22   NA    1
2:  2    B    D    58   NA   NA
3:  3    A    E    35   NA   NA
4:  4    B    C    86   NA   NA
5:  5    A    D    37   NA    3
6:  6    B    E    16   NA   NA

Data

DF1 <- structure(list(rn = 1:36, col1 = c("A", "B", "A", "B", "A", "B", 
"A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", 
"B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", 
"A", "B", "A", "B"), col2 = c("C", "D", "E", "C", "D", "E", "C", 
"D", "E", "C", "D", "E", "C", "D", "E", "C", "D", "E", "C", "D", 
"E", "C", "D", "E", "C", "D", "E", "C", "D", "E", "C", "D", "E", 
"C", "D", "E"), value = c(22L, 58L, 35L, 86L, 37L, 16L, 46L, 
23L, 88L, 3L, 33L, 25L, 19L, 24L, 9L, 76L, 62L, 68L, 97L, 43L, 
8L, 84L, 36L, 20L, 57L, 99L, 42L, 64L, 87L, 1L, 78L, 34L, 41L, 
32L, 10L, 72L), col4 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), .Names = c("rn", 
"col1", "col2", "value", "col4"), row.names = c(NA, -36L), class = "data.frame")
DF2 <- structure(list(rn = 1:6, col1 = c("A", "A", "A", "A", "A", "A"
), col2 = c("C", "D", "C", "D", "C", "D"), data = c(1L, 3L, 1L, 
3L, 1L, 3L), min = c(0L, 10L, 20L, 30L, 40L, 50L), max = c(10L, 
20L, 30L, 40L, 50L, 60L)), .Names = c("rn", "col1", "col2", "data", 
"min", "max"), row.names = c(NA, -6L), class = "data.frame")

174

answered Nov 14 '22 21:11

Uwe

Related questions
                            
                                bit64 integers with fst
                            
                                Using discrete custom color in a plotly heatmap
                            
                                How to send json response using plumber R
                            
                                Replace Values in Dataframe Column based on match in second data frame columns
                            
                                How to remove more than 2 consecutive NA's in a column?
                            
                                Compilation error using Rcpp with typedef
                            
                                Change parameter values at time step in deSolve
                            
                                Plot the positive infinity symbol and negative infinity symbol
                            
                                Draw a map of a specific country with leaflet
                            
                                ggplot not plotting the correct color [duplicate]
                            
                                Multiply each column of a data frame by the corresponding value of a vector [duplicate]
                            
                                Change the input value in shiny from server
                            
                                Efficient way to insert data frame from R to SQL
                            
                                Read the file created/modified last in different directories in R
                            
                                Numerical Triple Integration in R
                            
                                R ODBC - Querying Column name with spaces
                            
                                Extract rows that have duplicates for certain column but are unique in another column
                            
                                how to replace a character INSIDE the text content of many files automatically?
                            
                                Why doesn't dplyr filter() work within function (i.e. using variable for column name)?
                            
                                run R as administrator

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

R: merge based on multiple conditions (with non-equal criteria)

Tags:

merge

dataframe

r

Tara Sutjarittham

People also ask

1 Answers

Data

Uwe

Recent Activity

Donate For Us