Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

merge vectors of a list using row names in R

Tags:

merge

list

r

matrix

I have a list containing 5 matrices, each of different size and I would like to merge all of them using the row names.

Here is a reproducible example of my list (I am using igraph_0.6.5-2 on R version 3.0.1):

x <- list(    
as.matrix(c(1,4)),
as.matrix(c(3,19,11)),
as.matrix(c(3,9,8,5)),
as.matrix(c(3,10,8,87,38,92)),
as.matrix(c(87,8,8,87,38,92))  
)   

colnames(x[[1]]) <- c("P1")  
colnames(x[[2]]) <- c("P2")  
colnames(x[[3]]) <- c("P3")  
colnames(x[[4]]) <- c("P4")  
colnames(x[[5]]) <- c("P5")  
rownames(x[[1]]) <- c("A","B")   
rownames(x[[2]]) <- c("B","C","D")  
rownames(x[[3]]) <- c("A","B", "E", "F")  
rownames(x[[4]]) <- c("A","F","G","H","I","J" )  
rownames(x[[5]]) <- c("B", "H","I","J", "K","L")  

which gives me the following list:

> x
[[1]]
  P1
A  1
B  4
[[2]]
  P2
B  3
C 19
D 11
[[3]]
  P3
A  3
B  9
E  8
F  5
[[4]]
  P4
A  3
F 10
G  8
H 87
I 38
J 92
[[5]]
  P5
B 87
H  8
I  8
J 87
K 38
L 92

I would like to obtain something like this:

>   P1  P2  P3  P4  P5 
A    1  na   3   3  na 
B    4   3   9  na  87 
C   na  19  na  na  na 
D   na  11  na  na  na 
E   na  na   8  na  na 
F   na  na   5  10  na 
G   na  na  na   8  na 
H   na  na  na  87  na 
I   na  na  na  38   8 
J   na  na  na  92  87 
K   na  na  na  na  38 
L   na  na  na  na  92 

Merging them using the do.call function:

y <- do.call(merge,c(x, by="row.names",all=TRUE))

gives me the following error:

Error in fix.by(by.x, x) : 'by' must match numbers of columns

Any help is greatly appreciated. Thanks!

like image 891
Charlie Avatar asked Aug 01 '13 16:08

Charlie


People also ask

Does Cbind match row names?

Cbind: Combine objects by columns matching the rows on row names in mbojan/mbtools: Chaotic Collection of Functions and Datasets Possibly Useful Also To Others.

How do I combine rows from the same dataset in R?

To append (add) rows from one or more dataframes to another, use the bind_rows() function from dplyr . This function is especially useful in combining survey responses from different individuals. bind_rows() will match columns by name, so the dataframes can have different numbers and names of columns and rows.

What is merge () in R?

The merge() function in R combines two data frames. The most crucial requirement for connecting two data frames is that the column type is the same on which the merging occurs. The merge() function is similar to the join function in a Relational Database Management System (RDMS).


1 Answers

I would create a helper function to move your row.names() to a column in a data.frame, and use Reduce() to merge() all the data.frames in your list:

rownames2col <- function(inDF, RowName = ".rownames") {
  temp <- data.frame(rownames(inDF), inDF, row.names = NULL)
  names(temp)[1] <- RowName
  temp
}

Reduce(function(x, y) merge(x, y, by = ".rownames", all = TRUE), 
       lapply(x, rownames2col))
#    .rownames P1 P2 P3 P4 P5
# 1          A  1 NA  3  3 NA
# 2          B  4  3  9 NA 87
# 3          C NA 19 NA NA NA
# 4          D NA 11 NA NA NA
# 5          E NA NA  8 NA NA
# 6          F NA NA  5 10 NA
# 7          G NA NA NA  8 NA
# 8          H NA NA NA 87  8
# 9          I NA NA NA 38  8
# 10         J NA NA NA 92 87
# 11         K NA NA NA NA 38
# 12         L NA NA NA NA 92

The reason for the added step of bringing the rownames() in as a column is that merging by row.names creates a column called Row.names on the first merge() in Reduce(), thus not allowing the subsequent list() items to be conveniently merged.

> Reduce(function(x, y) merge(x, y, by = "row.names", all = TRUE), x[1:2])
  Row.names P1 P2
1         A  1 NA
2         B  4  3
3         C NA 19
4         D NA 11

Update: A data.table approach

A very similar concept can be used with data.table by setting the keep.rownames argument as "TRUE" and setting the key to the resulting "rn" column.

library(data.table)
Reduce(function(x, y) merge(x, y, all = TRUE), 
       lapply(x, function(y) data.table(y, keep.rownames=TRUE, key = "rn")))
#     rn P1 P2 P3 P4 P5
#  1:  A  1 NA  3  3 NA
#  2:  B  4  3  9 NA 87
#  3:  C NA 19 NA NA NA
#  4:  D NA 11 NA NA NA
#  5:  E NA NA  8 NA NA
#  6:  F NA NA  5 10 NA
#  7:  G NA NA NA  8 NA
#  8:  H NA NA NA 87  8
#  9:  I NA NA NA 38  8
# 10:  J NA NA NA 92 87
# 11:  K NA NA NA NA 38
# 12:  L NA NA NA NA 92

Update 2: A "manual" approach

There is, of course, the manual approach, assisted by a for loop. This might actually be faster than the above because merge is pretty slow in comparison to basic subsetting. Another advantage with respect to speed is that your resulting object is a matrix and many matrix operations are faster than data.frame operations.

## Identify the unique "rownames" for all list items
Rows <- unique(unlist(lapply(x, rownames)))

## Create a matrix of NA values 
##   with appropriate dimensions and dimnames
myMat <- matrix(NA, nrow = length(Rows), ncol = length(x), 
                dimnames = list(Rows, sapply(x, colnames)))


## Use your `for` loop to fill it in
##   with the appropriate values from your list
for (i in seq_along(x)) {
  myMat[rownames(x[[i]]), i] <- x[[i]]
}
myMat
#   P1 P2 P3 P4 P5
# A  1 NA  3  3 NA
# B  4  3  9 NA 87
# C NA 19 NA NA NA
# D NA 11 NA NA NA
# E NA NA  8 NA NA
# F NA NA  5 10 NA
# G NA NA NA  8 NA
# H NA NA NA 87  8
# I NA NA NA 38  8
# J NA NA NA 92 87
# K NA NA NA NA 38
# L NA NA NA NA 92
like image 107
A5C1D2H2I1M1N2O1R2T1 Avatar answered Sep 21 '22 20:09

A5C1D2H2I1M1N2O1R2T1