Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace value in data frame with value from other data frame based on set of conditions

Tags:

r

In df1 I need to replace values for msec with corresponding values in df2.

df1 <- data.frame(ID=c('rs', 'rs', 'rs', 'tr','tr','tr'), cond=c(1,1,2,1,1,2), 
block=c(2,2,4,2,2,4), correct=c(1,0,1,1,1,0), msec=c(456,678,756,654,625,645))

df2 <- data.frame(ID=c('rs', 'rs', 'tr','tr'), cond=c(1,2,1,2), 
block=c(2,4,2,4), mean=c(545,664,703,765))

In df1, if correct==0, then reference df2 with the matching values of ID, cond, and block. Replace the value for msec in df1 with the corresponding value for mean in df2.

For example, the second row in df1 has correct==0. So, in df2 find the corresponding row where ID=='rs', cond==1, block==2 and use the value for mean (mean=545) to replace the value for msec (msec=678). Note that in df1 combinations of ID, block, and cond can repeat, but each combination occurs only once in df2.

like image 203
rds Avatar asked Apr 21 '17 14:04

rds


2 Answers

Using the data.table package:

# load the 'data.table' package
library(data.table)

# convert the data.frame's to data.table's
setDT(df1)
setDT(df2)

# update df1 by reference with a join with df2
df1[df2[, correct := 0], on = .(ID, cond, block, correct), msec := i.mean]

which gives:

> df1
   ID cond block correct msec
1: rs    1     2       1  456
2: rs    1     2       0  545
3: rs    2     4       1  756
4: tr    1     2       1  654
5: tr    1     2       1  625
6: tr    2     4       0  765

Note: The above code will update df1 instead of creating a new dataframe, which is more memory-efficient.

like image 94
Jaap Avatar answered Sep 22 '22 03:09

Jaap


One option would be to use base R with an interaction() and a match(). How about:

df1[which(df1$correct==0),"msec"] <- df2[match(interaction(df1[which(df1$correct==0),c("ID","cond","block")]), 
                                               interaction(df2[,c("ID","cond", "block")])),
                                         "mean"]

df1
#        ID cond block correct msec
#1 rs    1     2       1  456
#2 rs    1     2       0  545
#3 rs    2     4       1  756
#4 tr    1     2       1  654
#5 tr    1     2       1  625
#6 tr    2     4       0  765 

We overwrite the correct == 0 columns with their matched rows in df2$mean

Edit: Another option would be a sql merge this could look like:

library(sqldf)
merged <- sqldf('SELECT l.ID, l.cond, l.block, l.correct,
                        case when l.correct == 0 then r.mean else l.msec end as msec
                FROM df1 as l
                LEFT JOIN df2 as r
                ON l.ID = r.ID AND l.cond = r.cond AND l.block = r.block')


merged
  ID cond block correct msec
1 rs    1     2       1  456
2 rs    1     2       0  545
3 rs    2     4       1  756
4 tr    1     2       1  654
5 tr    1     2       1  625
6 tr    2     4       0  765
like image 26
Mike H. Avatar answered Sep 24 '22 03:09

Mike H.