Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

merge multiple data.frame by row in R

I would like to merge multiple data.frame in R using row.names, doing a full outer join. For this I was hoping to do the following:

x = as.data.frame(t(data.frame(a=10, b=13, c=14)))
y = as.data.frame(t(data.frame(a=1, b=2)))
z = as.data.frame(t(data.frame(a=3, b=4, c=3, d=11)))
res = Reduce(function(a,b) merge(a,b,by="row.names",all=T), list(x,y,z))

Warning message:
In merge.data.frame(a, b, by = "row.names", all = T) :
  column name ‘Row.names’ is duplicated in the result
> res
  Row.names Row.names V1.x V1.y V1
    1         1         a   10    1 NA
    2         2         b   13    2 NA
    3         3         c   14   NA NA
    4         a      <NA>   NA   NA  3
    5         b      <NA>   NA   NA  4
    6         c      <NA>   NA   NA  3
    7         d      <NA>   NA   NA 11

What I was hoping to get would be:

    V1 V2 V3
  a 10 1  3
  b 13 2  4
  c 14 NA 3
  d NA NA 11
like image 544
Alex Avatar asked Feb 09 '13 00:02

Alex


1 Answers

The following works (up to some final column renaming):

res <- Reduce(function(a,b){
        ans <- merge(a,b,by="row.names",all=T)
        row.names(ans) <- ans[,"Row.names"]
        ans[,!names(ans) %in% "Row.names"]
        }, list(x,y,z))

Indeed:

> res
  V1.x V1.y V1
a   10    1  3
b   13    2  4
c   14   NA  3
d   NA   NA 11

What happens with a row join is that a column with the original rownames is added in the answer, which in turn does not contain row names:

> merge(x,y,by="row.names",all=T)
  Row.names V1.x V1.y
1         a   10    1
2         b   13    2
3         c   14   NA

This behavior is documented in ?merge (under Value)

If the matching involved row names, an extra character column called Row.names is added at the left, and in all cases the result has ‘automatic’ row names.

When Reduce tries to merge again, it doesn't find any match unless the names are cleaned up manually.

like image 90
Ryogi Avatar answered Oct 07 '22 09:10

Ryogi