Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge data sets by row differening columns [duplicate]

Tags:

r

I have the need to merge data sets by row but they have differing columns. How can I easily get R to merge the rows, add missing columns and fill in the missing columns with NAs? Currently I would do it like this (very time consuming for multiple merges):

Creating fake data...

x1<-LETTERS[1:3]
x2<-letters[1:3]
x3<-rnorm(3)
x4<-rnorm(3)
x5<-rnorm(3)

Example of multiple data.frames with some similar columns, some different...

data.frame(x1,x2,x3,x4,x5)
data.frame(x1,x3,x4,x5)
data.frame(x2,x3,x4,x5)
data.frame(x1,x2,x3,x4,x5)

How I merge it now...

DF<-data.frame(rbind(data.frame(x1,x2,x3,x4,x5),
data.frame(x1,x2,x3,x4,x5),
data.frame("x2"=rep(NA,3),data.frame(x1,x3,x4,x5)),
data.frame("x1"=rep(NA,3),data.frame(x2,x3,x4,x5))))

DF

EDIT: I tried the suggested code as follows:

l <- list(data.frame(x1,x2,x3,x4,x5),
          data.frame(x1,x3,x4,x5),
          data.frame(x2,x3,x4,x5),
          data.frame(x1,x2,x3,x4,x5))

merger <- function(l) lapply(2:length(l), function(x) merge(l[[x-1]], l[[x]], all=TRUE)) 
while (length(l) != 1) l<-merger(l) 

l

Which yields:

[[1]]
  x1       x3      x4        x5 x2
1  A  0.25492 0.30160  0.259287  a
2  B -0.25937 0.45936 -0.075415  b
3  C -0.53493 1.18316  0.627335  c

Not:

> DF
     x1   x2       x3      x4        x5
1     A    a  0.25492 0.30160  0.259287
2     B    b -0.25937 0.45936 -0.075415
3     C    c -0.53493 1.18316  0.627335
4     A    a  0.25492 0.30160  0.259287
5     B    b -0.25937 0.45936 -0.075415
6     C    c -0.53493 1.18316  0.627335
7     A <NA>  0.25492 0.30160  0.259287
8     B <NA> -0.25937 0.45936 -0.075415
9     C <NA> -0.53493 1.18316  0.627335
10 <NA>    a  0.25492 0.30160  0.259287
11 <NA>    b -0.25937 0.45936 -0.075415
12 <NA>    c -0.53493 1.18316  0.627335

EDIT 2: Sorry to extend my original post but my low rep will not allow me to answer my own question.

Combining Jaron and daroczig's responses results in exactly what I want. I don't want to assign each data frame to an object, so combining them as a list and then using rbind fill works very nicely (see code below)

Thank you to both of you!

x1<-LETTERS[1:3] 
x2<-letters[1:3] 
x3<-rnorm(3) 
x4<-rnorm(3) 
x5<-rnorm(3)

DFlist<-list(data.frame(x1,x2,x3,x4,x5), 
             data.frame(x1,x3,x4,x5),
             data.frame(x2,x3,x4,x5), 
             data.frame(x1,x2,x3,x4,x5))

rbind.fill(DFlist) 
like image 731
Tyler Rinker Avatar asked Oct 25 '11 23:10

Tyler Rinker


People also ask

How do I combine duplicate rows in Excel with multiple columns?

Copy the headers of the original data and paste it where you want the consolidated data. Select the cell below the leftmost header. Click the Data tab. In the Data Tools group, click on the Consolidate icon.


2 Answers

I had to read your question quite a few times before I understood what you were looking for, but maybe you want rbind.fill from plyr:

d1 <- data.frame(x1,x2,x3,x4,x5)
d2 <- data.frame(x1,x3,x4,x5)
d3 <- data.frame(x2,x3,x4,x5)
d4 <- data.frame(x1,x2,x3,x4,x5)

> rbind.fill(d1,d4,d2,d3)
     x1   x2        x3         x4         x5
1     A    a 1.1216923  0.9236393  0.2749292
2     B    b 1.1913278  1.1145664 -0.5070576
3     C    c 0.2837657 -0.6631544 -1.0675885
4     A    a 1.1216923  0.9236393  0.2749292
5     B    b 1.1913278  1.1145664 -0.5070576
6     C    c 0.2837657 -0.6631544 -1.0675885
7     A <NA> 1.1216923  0.9236393  0.2749292
8     B <NA> 1.1913278  1.1145664 -0.5070576
9     C <NA> 0.2837657 -0.6631544 -1.0675885
10 <NA>    a 1.1216923  0.9236393  0.2749292
11 <NA>    b 1.1913278  1.1145664 -0.5070576
12 <NA>    c 0.2837657 -0.6631544 -1.0675885
like image 66
joran Avatar answered Nov 01 '22 22:11

joran


Using data.table::rbindlist with fill = TRUE option:

data.table::rbindlist(
  list(data.frame(x1,x2,x3,x4,x5), 
       data.frame(x1,x3,x4,x5),
       data.frame(x2,x3,x4,x5), 
       data.frame(x1,x2,x3,x4,x5)),
  fill = TRUE)
like image 33
zx8754 Avatar answered Nov 01 '22 21:11

zx8754