Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge Multiple Data Frames by Row Names

I'm trying to merge multiple data frames by row names.

I know how to do it with two:

x = data.frame(a = c(1,2,3), row.names = letters[1:3])
y = data.frame(b = c(1,2,3), row.names = letters[1:3])
merge(x,y, by = "row.names")

But when I try using the reshape package's merge_all() I'm getting an error.

z = data.frame(c = c(1,2,3), row.names = letters[1:3])
l = list(x,y,z)
merge_all(l, by = "row.names")

Error in -ncol(df) : invalid argument to unary operator

What's the best way to do this?

like image 367
Dirk Calloway Avatar asked Mar 24 '14 18:03

Dirk Calloway


People also ask

How do I merge Dataframes by row names in R?

The merge() function in base R can be used to merge input dataframes by common columns or row names. The merge() function retains all the row names of the dataframes, behaving similarly to the inner join. The dataframes are combined in order of the appearance in the input function call.

Can I merge more than 2 data frames in R?

Join Multiple R DataFrames To join more than two (multiple) R dataframes, then reduce() is used. It is available in the tidyverse package which will convert all the dataframes to a list and join the dataframes based on the column. It is similar to the above joins.


2 Answers

Merging by row.names does weird things - it creates a column called Row.names, which makes subsequent merges hard.

To avoid that issue you can instead create a column with the row names (which is generally a better idea anyway - row names are very limited and hard to manipulate). One way of doing that with the data as given in OP (not the most optimal way, for more optimal and easier ways of dealing with rectangular data I recommend getting to know data.table instead):

Reduce(merge, lapply(l, function(x) data.frame(x, rn = row.names(x))))
like image 81
eddi Avatar answered Sep 24 '22 11:09

eddi


maybe there exists a faster version using do.call or *apply, but this works in your case:

x = data.frame(X = c(1,2,3), row.names = letters[1:3])
y = data.frame(Y = c(1,2,3), row.names = letters[1:3])
z = data.frame(Z = c(1,2,3), row.names = letters[1:3])

merge.all <- function(x, ..., by = "row.names") {
  L <- list(...)
  for (i in seq_along(L)) {
    x <- merge(x, L[[i]], by = by)
    rownames(x) <- x$Row.names
    x$Row.names <- NULL
  }
  return(x)
}

merge.all(x,y,z)

important may be to define all the parameters (like by) in the function merge.all you want to forward to merge since the whole ... arguments are used in the list of objects to merge.

like image 32
setempler Avatar answered Sep 24 '22 11:09

setempler