I would like to merge
multiple data.frame
in R using row.names
, doing a full outer join. For this I was hoping to do the following:
x = as.data.frame(t(data.frame(a=10, b=13, c=14)))
y = as.data.frame(t(data.frame(a=1, b=2)))
z = as.data.frame(t(data.frame(a=3, b=4, c=3, d=11)))
res = Reduce(function(a,b) merge(a,b,by="row.names",all=T), list(x,y,z))
Warning message:
In merge.data.frame(a, b, by = "row.names", all = T) :
column name ‘Row.names’ is duplicated in the result
> res
Row.names Row.names V1.x V1.y V1
1 1 a 10 1 NA
2 2 b 13 2 NA
3 3 c 14 NA NA
4 a <NA> NA NA 3
5 b <NA> NA NA 4
6 c <NA> NA NA 3
7 d <NA> NA NA 11
What I was hoping to get would be:
V1 V2 V3
a 10 1 3
b 13 2 4
c 14 NA 3
d NA NA 11
The following works (up to some final column renaming):
res <- Reduce(function(a,b){
ans <- merge(a,b,by="row.names",all=T)
row.names(ans) <- ans[,"Row.names"]
ans[,!names(ans) %in% "Row.names"]
}, list(x,y,z))
Indeed:
> res
V1.x V1.y V1
a 10 1 3
b 13 2 4
c 14 NA 3
d NA NA 11
What happens with a row join is that a column with the original rownames is added in the answer, which in turn does not contain row names:
> merge(x,y,by="row.names",all=T)
Row.names V1.x V1.y
1 a 10 1
2 b 13 2
3 c 14 NA
This behavior is documented in ?merge
(under Value)
If the matching involved row names, an extra character column called Row.names is added at the left, and in all cases the result has ‘automatic’ row names.
When Reduce
tries to merge again, it doesn't find any match unless the names are cleaned up manually.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With