Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge two data.frames with replacement

I have two datasets. First one is smaller, but have more precise data. I need to join them, but: 1. If I have some data in Data1 - I'm using only this data. 2. If I haven't got data in Data1, but they're in Data2 - I'm using only data from Data2.

Data1 <- data.frame(
    X = c(1,4,7,10,13,16),
    Y = c("a", "b", "c", "d", "e", "f")
)

Data2 <- data.frame(
    X = c(1:10),
    Y = c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j")
)

So my data.frame should look like that:

DataJoin <- data.frame(
    X = c(1,4,7,10,13,16,7,8,9,10),
    Y = c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j")
)

How can I do that? I've tried somehow option merge form base package and data.table package, but I couldn't make it happend, as I like.

like image 461
Jot eN Avatar asked Jan 12 '23 16:01

Jot eN


2 Answers

There's no join needed. You can reformulate the problem as "add the data found in Data2 and not found in Data1 to Data1". So simply do:

id <- Data2$Y %in% Data1$Y
DataJoin <- rbind(Data1,Data2[!id,])

Gives:

> DataJoin
    X Y
1   1 a
2   4 b
3   7 c
4  10 d
5  13 e
6  16 f
7   7 g
8   8 h
9   9 i
10 10 j
like image 194
Joris Meys Avatar answered Jan 14 '23 04:01

Joris Meys


Using data.table:

d1 <- data.table(Data1, key="Y")[, X := as.integer(X)]
d2 <- data.table(Data2, key="Y")

# copy d2 so that it doesn't get modified by reference
# i.X refers to the column X of DT in 'i' = d1's 'X'
ans <- copy(d2)[d1, X := i.X] 
     X Y
 1:  1 a
 2:  4 b
 3:  7 c
 4: 10 d
 5: 13 e
 6: 16 f
 7:  7 g
 8:  8 h
 9:  9 i
10: 10 j
like image 32
Arun Avatar answered Jan 14 '23 04:01

Arun