Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to perform complex multicolumn match in R /

I wish to match two dataframes based on conditionals on more than one column but cannot figure out how. So if there are my data sets:

df1 <- data.frame(lower=c(0,5,10,15,20), upper=c(4,9,14,19,24), x=c(12,45,67,89,10))
df2 <- data.frame(age=c(12, 14, 5, 2, 9, 19, 22, 18, 23))

I wish to match age from df2 that falls into the range between lower and upper in df1 with the aim to add an extra column to df2 containing the value of x in df1 where age lies between upper and lower. i.e. I want df2 to look like

age    x
12    67
14    67
 5    45
....etc. 

How can I achieve such a match ?

like image 796
user2498193 Avatar asked Dec 01 '22 14:12

user2498193


2 Answers

I would go with a simple sapply and a "anded" condition in the df1$x selection like this:

df2$x <- sapply( df2$age, function(x) { df1$x[ x >= df1$lower & x <= df1$upper ] })

which gives:

> df2
  age  x
1  12 67
2  14 67
3   5 45
4   2 12
5   9 45
6  19 89
7  22 10
8  18 89
9  23 10

For age 12 for example the selection inside the brackets gives:

> 12 >= df1$lower & 12 <= df1$upper
[1] FALSE FALSE  TRUE FALSE FALSE

So getting df1$x by this logical vector is easy as your ranges don't overlap

like image 144
Tensibai Avatar answered Dec 15 '22 11:12

Tensibai


Using foverlaps from data.table is what you are looking for:

library(data.table)
setDT(df1)
setDT(df2)[,age2:=age]
setkey(df1,lower,upper)
foverlaps(df2, df1, by.x = names(df2),by.y=c("lower","upper"))[,list(age,x)]

#    age  x
# 1:  12 67
# 2:  14 67
# 3:   5 45
# 4:   2 12
# 5:   9 45
# 6:  19 89
# 7:  22 10
# 8:  18 89
# 9:  23 10
like image 42
agstudy Avatar answered Dec 15 '22 10:12

agstudy