I have the need to merge data sets by row but they have differing columns. How can I easily get R to merge the rows, add missing columns and fill in the missing columns with NAs? Currently I would do it like this (very time consuming for multiple merges):
Creating fake data...
x1<-LETTERS[1:3]
x2<-letters[1:3]
x3<-rnorm(3)
x4<-rnorm(3)
x5<-rnorm(3)
Example of multiple data.frames with some similar columns, some different...
data.frame(x1,x2,x3,x4,x5)
data.frame(x1,x3,x4,x5)
data.frame(x2,x3,x4,x5)
data.frame(x1,x2,x3,x4,x5)
How I merge it now...
DF<-data.frame(rbind(data.frame(x1,x2,x3,x4,x5),
data.frame(x1,x2,x3,x4,x5),
data.frame("x2"=rep(NA,3),data.frame(x1,x3,x4,x5)),
data.frame("x1"=rep(NA,3),data.frame(x2,x3,x4,x5))))
DF
EDIT: I tried the suggested code as follows:
l <- list(data.frame(x1,x2,x3,x4,x5),
data.frame(x1,x3,x4,x5),
data.frame(x2,x3,x4,x5),
data.frame(x1,x2,x3,x4,x5))
merger <- function(l) lapply(2:length(l), function(x) merge(l[[x-1]], l[[x]], all=TRUE))
while (length(l) != 1) l<-merger(l)
l
Which yields:
[[1]]
x1 x3 x4 x5 x2
1 A 0.25492 0.30160 0.259287 a
2 B -0.25937 0.45936 -0.075415 b
3 C -0.53493 1.18316 0.627335 c
Not:
> DF
x1 x2 x3 x4 x5
1 A a 0.25492 0.30160 0.259287
2 B b -0.25937 0.45936 -0.075415
3 C c -0.53493 1.18316 0.627335
4 A a 0.25492 0.30160 0.259287
5 B b -0.25937 0.45936 -0.075415
6 C c -0.53493 1.18316 0.627335
7 A <NA> 0.25492 0.30160 0.259287
8 B <NA> -0.25937 0.45936 -0.075415
9 C <NA> -0.53493 1.18316 0.627335
10 <NA> a 0.25492 0.30160 0.259287
11 <NA> b -0.25937 0.45936 -0.075415
12 <NA> c -0.53493 1.18316 0.627335
EDIT 2: Sorry to extend my original post but my low rep will not allow me to answer my own question.
Combining Jaron and daroczig's responses results in exactly what I want. I don't want to assign each data frame to an object, so combining them as a list and then using rbind fill works very nicely (see code below)
Thank you to both of you!
x1<-LETTERS[1:3]
x2<-letters[1:3]
x3<-rnorm(3)
x4<-rnorm(3)
x5<-rnorm(3)
DFlist<-list(data.frame(x1,x2,x3,x4,x5),
data.frame(x1,x3,x4,x5),
data.frame(x2,x3,x4,x5),
data.frame(x1,x2,x3,x4,x5))
rbind.fill(DFlist)
Copy the headers of the original data and paste it where you want the consolidated data. Select the cell below the leftmost header. Click the Data tab. In the Data Tools group, click on the Consolidate icon.
I had to read your question quite a few times before I understood what you were looking for, but maybe you want rbind.fill
from plyr
:
d1 <- data.frame(x1,x2,x3,x4,x5)
d2 <- data.frame(x1,x3,x4,x5)
d3 <- data.frame(x2,x3,x4,x5)
d4 <- data.frame(x1,x2,x3,x4,x5)
> rbind.fill(d1,d4,d2,d3)
x1 x2 x3 x4 x5
1 A a 1.1216923 0.9236393 0.2749292
2 B b 1.1913278 1.1145664 -0.5070576
3 C c 0.2837657 -0.6631544 -1.0675885
4 A a 1.1216923 0.9236393 0.2749292
5 B b 1.1913278 1.1145664 -0.5070576
6 C c 0.2837657 -0.6631544 -1.0675885
7 A <NA> 1.1216923 0.9236393 0.2749292
8 B <NA> 1.1913278 1.1145664 -0.5070576
9 C <NA> 0.2837657 -0.6631544 -1.0675885
10 <NA> a 1.1216923 0.9236393 0.2749292
11 <NA> b 1.1913278 1.1145664 -0.5070576
12 <NA> c 0.2837657 -0.6631544 -1.0675885
Using data.table::rbindlist with fill = TRUE option:
data.table::rbindlist(
list(data.frame(x1,x2,x3,x4,x5),
data.frame(x1,x3,x4,x5),
data.frame(x2,x3,x4,x5),
data.frame(x1,x2,x3,x4,x5)),
fill = TRUE)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With