Given 2 data frames that are identical in terms of column names/datatypes, where some columns uniquely identify the rows, is there an efficient function/method for one data.frame to "update" the other?
For example, in the following, original
and replacement
are identified by 'Name'
and 'Id'
. goal
is the result of finding all rows from replacement
in original
(by the unique id's) and replacing with Value1
and Value2
original = data.frame( Name = c("joe","john") , Id = c( 1 , 2) , Value1 = c(1.2,NA), Value2 = c(NA,9.2) )
replacement = data.frame( Name = c("john") , Id = 2 , Value1 = 2.2 , value2 = 5.9)
goal = data.frame( Name = c("joe","john") , Id = c( 1 , 2) , Value1 = c(1.2,2.2), Value2 = c(NA,5.9) )
The solution should work for an original
and replacement
of arbitrary length (although replacement
should never have more rows than original
). In practice, I'm using 2 id columns.
I'd use data.table
objects. This code seems to work on your example:
library(data.table)
# set keys
original.dt <- data.table(original, key=c("Name", "Id"))
replacement.dt <- data.table(replacement, key=c("Name", "Id"))
goal2 <- original.dt
# subset and reassign
# goal2[replacement.dt[, list(Name, Id)]] <- replacement.dt
goal2[replacement.dt] <- replacement.dt # cleaner and faster, see Matthew's comment
goal2 <- as.data.frame(goal2)
identical(goal, goal2) # FALSE, why? See Joris's comment
all.equal(goal, goal2) # TRUE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With