merge list of data frames by different ids

Question

I have lists of variable length with dataframes. I want to merge the dfs in each list into a single df using a specified column name or index that varies by df. Here's an example with 3 dfs

my.list <- list(
data.frame(a = 1:10, b = letters[1:10], c = 101:110),
data.frame(d = 6:15, e = letters[1:10], f = 1:10),
data.frame(l = 2:11, m = letters[11:20], o = 1:10))

and I want to merge by a specific column of each df mentioned in ids

ids <- c('a', 'f', 'l')

to get something that looks like

id  b   c   d   e   m   o
1   a   101 6   a   NA  NA
2   b   102 7   b   k   1
3   c   103 8   c   l   2
4   d   104 9   d   m   3
5   e   105 10  e   n   4
6   f   106 11  f   o   5
7   g   107 12  g   p   6
8   h   108 13  h   q   7
9   i   109 14  i   r   8
10  j   110 15  j   s   9
11  NA  NA  NA  NA  t   10

I've tried to do this with merge and/or Reduce, but failed to pass on the ids

akrun · Accepted Answer

We can change the names unique to all the list elements by changing the column name that corresponds to 'ids' with 'id' and then do the Reduce with merge

lst <- Map(function(x, y) {names(x)[match(y, names(x))] <- 'id'; x}, my.list, ids)
Reduce(function(...) merge(..., by = 'id', all = TRUE), lst)
#   id    b   c  d    e    m  o
#1   1    a 101  6    a <NA> NA
#2   2    b 102  7    b    k  1
#3   3    c 103  8    c    l  2
#4   4    d 104  9    d    m  3
#5   5    e 105 10    e    n  4
#6   6    f 106 11    f    o  5
#7   7    g 107 12    g    p  6
#8   8    h 108 13    h    q  7
#9   9    i 109 14    i    r  8
#10 10    j 110 15    j    s  9
#11 11 <NA>  NA NA <NA>    t 10

Eric Watt · Answer

Here is a data.table answer with a similar approach as @akrun's answer.

However, instead of renaming the columns, we'll set them as keys. Then we can merge by keys, rather than by name. This preserves the column names.

library(data.table)

funky <- function(x) {
  setDT(my.list[[x]])
  setkeyv(my.list[[x]], ids[x])
  return(NULL)
}

So this function will be passed an index x. First, it will set the data.frame in the xth position of my.list to data.table. Then, it will set the key of this new data.table based on the column name specified at the same position in ids. Finally, since this is all done in place, return NULL to prevent useless printout.

Now apply the function to all of the objects in the list.

a <- lapply(seq_along(ids), funky)
Reduce(function(x, y) merge(x, 
                            y, 
                            by.x = key(x), 
                            by.y = key(y), 
                            all = TRUE), 
       my.list)

Unpacking the Reduce, we can specify the columns to merge by using key(x) and key(y). This is the step that lets us avoid modifying the column names.

#      a  b   c  d  e  m  o
#  1:  1  a 101  6  a NA NA
#  2:  2  b 102  7  b  k  1
#  3:  3  c 103  8  c  l  2
#  4:  4  d 104  9  d  m  3
#  5:  5  e 105 10  e  n  4
#  6:  6  f 106 11  f  o  5
#  7:  7  g 107 12  g  p  6
#  8:  8  h 108 13  h  q  7
#  9:  9  i 109 14  i  r  8
# 10: 10  j 110 15  j  s  9
# 11: 11 NA  NA NA NA  t 10

Sotos · Answer

An idea could be to convert the columns of interest into rownames and then merge on rownames, i.e.

l1 <- Map(function(x, y) {rownames(x) <- x[[y]]; x}, my.list, ids)
Reduce(function(x, y)merge(x, y, all = TRUE), lapply(l1, function(x) 
                                                 data.frame(x, id = rownames(x))))

#   id  a    b   c  d    e  f  l    m  o
#1   1  1    a 101  6    a  1 NA <NA> NA
#2  10 10    j 110 15    j 10 10    s  9
#3   2  2    b 102  7    b  2  2    k  1
#4   3  3    c 103  8    c  3  3    l  2
#5   4  4    d 104  9    d  4  4    m  3
#6   5  5    e 105 10    e  5  5    n  4
#7   6  6    f 106 11    f  6  6    o  5
#8   7  7    g 107 12    g  7  7    p  6
#9   8  8    h 108 13    h  8  8    q  7
#10  9  9    i 109 14    i  9  9    r  8
#11 11 NA <NA>  NA NA <NA> NA 11    t 10

merge list of data frames by different ids

Tags:

merge

r

Lukas

3 Answers

akrun

Eric Watt

Sotos

Recent Activity

Donate For Us

merge list of data frames by different ids

Tags:

merge

r

Lukas

3 Answers

akrun

Eric Watt

Sotos

Related questions

Recent Activity

Donate For Us