I have a list containing 5 matrices, each of different size and I would like to merge all of them using the row names.
Here is a reproducible example of my list (I am using igraph_0.6.5-2 on R version 3.0.1):
x <- list(
as.matrix(c(1,4)),
as.matrix(c(3,19,11)),
as.matrix(c(3,9,8,5)),
as.matrix(c(3,10,8,87,38,92)),
as.matrix(c(87,8,8,87,38,92))
)
colnames(x[[1]]) <- c("P1")
colnames(x[[2]]) <- c("P2")
colnames(x[[3]]) <- c("P3")
colnames(x[[4]]) <- c("P4")
colnames(x[[5]]) <- c("P5")
rownames(x[[1]]) <- c("A","B")
rownames(x[[2]]) <- c("B","C","D")
rownames(x[[3]]) <- c("A","B", "E", "F")
rownames(x[[4]]) <- c("A","F","G","H","I","J" )
rownames(x[[5]]) <- c("B", "H","I","J", "K","L")
which gives me the following list:
> x
[[1]]
P1
A 1
B 4
[[2]]
P2
B 3
C 19
D 11
[[3]]
P3
A 3
B 9
E 8
F 5
[[4]]
P4
A 3
F 10
G 8
H 87
I 38
J 92
[[5]]
P5
B 87
H 8
I 8
J 87
K 38
L 92
I would like to obtain something like this:
> P1 P2 P3 P4 P5
A 1 na 3 3 na
B 4 3 9 na 87
C na 19 na na na
D na 11 na na na
E na na 8 na na
F na na 5 10 na
G na na na 8 na
H na na na 87 na
I na na na 38 8
J na na na 92 87
K na na na na 38
L na na na na 92
Merging them using the do.call
function:
y <- do.call(merge,c(x, by="row.names",all=TRUE))
gives me the following error:
Error in fix.by(by.x, x) : 'by' must match numbers of columns
Any help is greatly appreciated. Thanks!
Cbind: Combine objects by columns matching the rows on row names in mbojan/mbtools: Chaotic Collection of Functions and Datasets Possibly Useful Also To Others.
To append (add) rows from one or more dataframes to another, use the bind_rows() function from dplyr . This function is especially useful in combining survey responses from different individuals. bind_rows() will match columns by name, so the dataframes can have different numbers and names of columns and rows.
The merge() function in R combines two data frames. The most crucial requirement for connecting two data frames is that the column type is the same on which the merging occurs. The merge() function is similar to the join function in a Relational Database Management System (RDMS).
I would create a helper function to move your row.names()
to a column in a data.frame
, and use Reduce()
to merge()
all the data.frame
s in your list
:
rownames2col <- function(inDF, RowName = ".rownames") {
temp <- data.frame(rownames(inDF), inDF, row.names = NULL)
names(temp)[1] <- RowName
temp
}
Reduce(function(x, y) merge(x, y, by = ".rownames", all = TRUE),
lapply(x, rownames2col))
# .rownames P1 P2 P3 P4 P5
# 1 A 1 NA 3 3 NA
# 2 B 4 3 9 NA 87
# 3 C NA 19 NA NA NA
# 4 D NA 11 NA NA NA
# 5 E NA NA 8 NA NA
# 6 F NA NA 5 10 NA
# 7 G NA NA NA 8 NA
# 8 H NA NA NA 87 8
# 9 I NA NA NA 38 8
# 10 J NA NA NA 92 87
# 11 K NA NA NA NA 38
# 12 L NA NA NA NA 92
The reason for the added step of bringing the rownames()
in as a column is that merging by row.names
creates a column called Row.names
on the first merge()
in Reduce()
, thus not allowing the subsequent list()
items to be conveniently merged.
> Reduce(function(x, y) merge(x, y, by = "row.names", all = TRUE), x[1:2])
Row.names P1 P2
1 A 1 NA
2 B 4 3
3 C NA 19
4 D NA 11
data.table
approachA very similar concept can be used with data.table
by setting the keep.rownames
argument as "TRUE
" and setting the key
to the resulting "rn
" column.
library(data.table)
Reduce(function(x, y) merge(x, y, all = TRUE),
lapply(x, function(y) data.table(y, keep.rownames=TRUE, key = "rn")))
# rn P1 P2 P3 P4 P5
# 1: A 1 NA 3 3 NA
# 2: B 4 3 9 NA 87
# 3: C NA 19 NA NA NA
# 4: D NA 11 NA NA NA
# 5: E NA NA 8 NA NA
# 6: F NA NA 5 10 NA
# 7: G NA NA NA 8 NA
# 8: H NA NA NA 87 8
# 9: I NA NA NA 38 8
# 10: J NA NA NA 92 87
# 11: K NA NA NA NA 38
# 12: L NA NA NA NA 92
There is, of course, the manual approach, assisted by a for
loop. This might actually be faster than the above because merge
is pretty slow in comparison to basic subsetting. Another advantage with respect to speed is that your resulting object is a matrix
and many matrix
operations are faster than data.frame
operations.
## Identify the unique "rownames" for all list items
Rows <- unique(unlist(lapply(x, rownames)))
## Create a matrix of NA values
## with appropriate dimensions and dimnames
myMat <- matrix(NA, nrow = length(Rows), ncol = length(x),
dimnames = list(Rows, sapply(x, colnames)))
## Use your `for` loop to fill it in
## with the appropriate values from your list
for (i in seq_along(x)) {
myMat[rownames(x[[i]]), i] <- x[[i]]
}
myMat
# P1 P2 P3 P4 P5
# A 1 NA 3 3 NA
# B 4 3 9 NA 87
# C NA 19 NA NA NA
# D NA 11 NA NA NA
# E NA NA 8 NA NA
# F NA NA 5 10 NA
# G NA NA NA 8 NA
# H NA NA NA 87 8
# I NA NA NA 38 8
# J NA NA NA 92 87
# K NA NA NA NA 38
# L NA NA NA NA 92
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With