Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Concatenate data frames together based on similar column values

Tags:

r

Specifically, say I had three data frames d1, d2, d3:

d1:

       X     Y    Z    value
1      0    20   135    43
2      0     4   105    50
3      5    18    20    10
...

d2:

       X     Y    Z    value
1      0    20   135    15
2      0     4   105    14
3      2     9    12    16
...

d3:

       X     Y    Z    value
1      0    20   135    29
2      2     9    14    16
...

I want to be able to combine these data frames such that each row of the combined data frame consists of three values, based on all unique X, Y, Z combinations. If such an X, Y, Z combination does not exist in one of the original data frames then I just want it to have a value of null (or some arbitrarily low number if that isn't possible). So I'd want an output of:

dfinal:

       X     Y    Z    value1  value2  value3
1      0    20   135     43      15      29
2      0     4   105     50      14     null
3      5    18    20     10     null    null
4      2     9    12    null     16     null
5      2     9    14    null    null     16
...

Is there any efficient way of doing this? I've tried doing this instead using data.table which seemed more suited for this but have yet to figure out how.

like image 539
Leeren Avatar asked Feb 10 '23 05:02

Leeren


2 Answers

?merge

Should do the trick?

 By default the data frames are merged on the columns with names they both have, but separate specifications of the columns can be given by by.x and by.y.

So:

merge(d1,d2, by=c("X","Y","Z"))

And you can include all=TRUE, to have complete rows. The missing data will be NA

    merge(d1,d2, by=c("X","Y","Z"), all=TRUE)
like image 187
Arcoutte Avatar answered Feb 13 '23 03:02

Arcoutte


Take a look at dplyr and its join methods. I wrote a small example:

library(dplyr)
library(data.table)

d1 <- data.table(X = c(1,2,3), Y = c(2,3,4), Z = c(8,3,9), value = c(22,3,44))
d2 <- data.table(X = c(1,4,3), Y = c(2,6,4), Z = c(8,9,9), value = c(44,22,11))

d2 <- rename(d2, value2 = value)


full_join(d1,d2)

output:

  X Y Z value value2
1 1 2 8    22     44
2 2 3 3     3     NA
3 3 4 9    44     11
4 4 6 9    NA     22
like image 34
pfuhlert Avatar answered Feb 13 '23 02:02

pfuhlert