I have lists of variable length with dataframes. I want to merge the dfs in each list into a single df using a specified column name or index that varies by df. Here's an example with 3 dfs
my.list <- list(
data.frame(a = 1:10, b = letters[1:10], c = 101:110),
data.frame(d = 6:15, e = letters[1:10], f = 1:10),
data.frame(l = 2:11, m = letters[11:20], o = 1:10))
and I want to merge by a specific column of each df mentioned in ids
ids <- c('a', 'f', 'l')
to get something that looks like
id b c d e m o
1 a 101 6 a NA NA
2 b 102 7 b k 1
3 c 103 8 c l 2
4 d 104 9 d m 3
5 e 105 10 e n 4
6 f 106 11 f o 5
7 g 107 12 g p 6
8 h 108 13 h q 7
9 i 109 14 i r 8
10 j 110 15 j s 9
11 NA NA NA NA t 10
I've tried to do this with merge
and/or Reduce
, but failed to pass on the ids
We can change the names
unique to all the list
elements by changing the column name that corresponds to 'ids' with 'id' and then do the Reduce
with merge
lst <- Map(function(x, y) {names(x)[match(y, names(x))] <- 'id'; x}, my.list, ids)
Reduce(function(...) merge(..., by = 'id', all = TRUE), lst)
# id b c d e m o
#1 1 a 101 6 a <NA> NA
#2 2 b 102 7 b k 1
#3 3 c 103 8 c l 2
#4 4 d 104 9 d m 3
#5 5 e 105 10 e n 4
#6 6 f 106 11 f o 5
#7 7 g 107 12 g p 6
#8 8 h 108 13 h q 7
#9 9 i 109 14 i r 8
#10 10 j 110 15 j s 9
#11 11 <NA> NA NA <NA> t 10
Here is a data.table
answer with a similar approach as @akrun's answer.
However, instead of renaming the columns, we'll set them as keys. Then we can merge by keys, rather than by name. This preserves the column names.
library(data.table)
funky <- function(x) {
setDT(my.list[[x]])
setkeyv(my.list[[x]], ids[x])
return(NULL)
}
So this function will be passed an index x
. First, it will set the data.frame
in the xth
position of my.list
to data.table
. Then, it will set the key of this new data.table
based on the column name specified at the same position in ids
. Finally, since this is all done in place, return NULL
to prevent useless printout.
Now apply the function to all of the objects in the list.
a <- lapply(seq_along(ids), funky)
Reduce(function(x, y) merge(x,
y,
by.x = key(x),
by.y = key(y),
all = TRUE),
my.list)
Unpacking the Reduce
, we can specify the columns to merge by using key(x)
and key(y)
. This is the step that lets us avoid modifying the column names.
# a b c d e m o
# 1: 1 a 101 6 a NA NA
# 2: 2 b 102 7 b k 1
# 3: 3 c 103 8 c l 2
# 4: 4 d 104 9 d m 3
# 5: 5 e 105 10 e n 4
# 6: 6 f 106 11 f o 5
# 7: 7 g 107 12 g p 6
# 8: 8 h 108 13 h q 7
# 9: 9 i 109 14 i r 8
# 10: 10 j 110 15 j s 9
# 11: 11 NA NA NA NA t 10
An idea could be to convert the columns of interest into rownames and then merge on rownames, i.e.
l1 <- Map(function(x, y) {rownames(x) <- x[[y]]; x}, my.list, ids)
Reduce(function(x, y)merge(x, y, all = TRUE), lapply(l1, function(x)
data.frame(x, id = rownames(x))))
# id a b c d e f l m o
#1 1 1 a 101 6 a 1 NA <NA> NA
#2 10 10 j 110 15 j 10 10 s 9
#3 2 2 b 102 7 b 2 2 k 1
#4 3 3 c 103 8 c 3 3 l 2
#5 4 4 d 104 9 d 4 4 m 3
#6 5 5 e 105 10 e 5 5 n 4
#7 6 6 f 106 11 f 6 6 o 5
#8 7 7 g 107 12 g 7 7 p 6
#9 8 8 h 108 13 h 8 8 q 7
#10 9 9 i 109 14 i 9 9 r 8
#11 11 NA <NA> NA NA <NA> NA 11 t 10
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With