Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

merge list of data frames by different ids

Tags:

merge

r

I have lists of variable length with dataframes. I want to merge the dfs in each list into a single df using a specified column name or index that varies by df. Here's an example with 3 dfs

my.list <- list(
data.frame(a = 1:10, b = letters[1:10], c = 101:110),
data.frame(d = 6:15, e = letters[1:10], f = 1:10),
data.frame(l = 2:11, m = letters[11:20], o = 1:10))

and I want to merge by a specific column of each df mentioned in ids

ids <- c('a', 'f', 'l')

to get something that looks like

id  b   c   d   e   m   o
1   a   101 6   a   NA  NA
2   b   102 7   b   k   1
3   c   103 8   c   l   2
4   d   104 9   d   m   3
5   e   105 10  e   n   4
6   f   106 11  f   o   5
7   g   107 12  g   p   6
8   h   108 13  h   q   7
9   i   109 14  i   r   8
10  j   110 15  j   s   9
11  NA  NA  NA  NA  t   10

I've tried to do this with merge and/or Reduce, but failed to pass on the ids

like image 557
Lukas Avatar asked Jul 21 '17 12:07

Lukas


3 Answers

We can change the names unique to all the list elements by changing the column name that corresponds to 'ids' with 'id' and then do the Reduce with merge

lst <- Map(function(x, y) {names(x)[match(y, names(x))] <- 'id'; x}, my.list, ids)
Reduce(function(...) merge(..., by = 'id', all = TRUE), lst)
#   id    b   c  d    e    m  o
#1   1    a 101  6    a <NA> NA
#2   2    b 102  7    b    k  1
#3   3    c 103  8    c    l  2
#4   4    d 104  9    d    m  3
#5   5    e 105 10    e    n  4
#6   6    f 106 11    f    o  5
#7   7    g 107 12    g    p  6
#8   8    h 108 13    h    q  7
#9   9    i 109 14    i    r  8
#10 10    j 110 15    j    s  9
#11 11 <NA>  NA NA <NA>    t 10
like image 131
akrun Avatar answered Oct 27 '22 16:10

akrun


Here is a data.table answer with a similar approach as @akrun's answer.

However, instead of renaming the columns, we'll set them as keys. Then we can merge by keys, rather than by name. This preserves the column names.

library(data.table)

funky <- function(x) {
  setDT(my.list[[x]])
  setkeyv(my.list[[x]], ids[x])
  return(NULL)
}

So this function will be passed an index x. First, it will set the data.frame in the xth position of my.list to data.table. Then, it will set the key of this new data.table based on the column name specified at the same position in ids. Finally, since this is all done in place, return NULL to prevent useless printout.

Now apply the function to all of the objects in the list.

a <- lapply(seq_along(ids), funky)
Reduce(function(x, y) merge(x, 
                            y, 
                            by.x = key(x), 
                            by.y = key(y), 
                            all = TRUE), 
       my.list)

Unpacking the Reduce, we can specify the columns to merge by using key(x) and key(y). This is the step that lets us avoid modifying the column names.

#      a  b   c  d  e  m  o
#  1:  1  a 101  6  a NA NA
#  2:  2  b 102  7  b  k  1
#  3:  3  c 103  8  c  l  2
#  4:  4  d 104  9  d  m  3
#  5:  5  e 105 10  e  n  4
#  6:  6  f 106 11  f  o  5
#  7:  7  g 107 12  g  p  6
#  8:  8  h 108 13  h  q  7
#  9:  9  i 109 14  i  r  8
# 10: 10  j 110 15  j  s  9
# 11: 11 NA  NA NA NA  t 10
like image 35
Eric Watt Avatar answered Oct 27 '22 15:10

Eric Watt


An idea could be to convert the columns of interest into rownames and then merge on rownames, i.e.

l1 <- Map(function(x, y) {rownames(x) <- x[[y]]; x}, my.list, ids)
Reduce(function(x, y)merge(x, y, all = TRUE), lapply(l1, function(x) 
                                                 data.frame(x, id = rownames(x))))

#   id  a    b   c  d    e  f  l    m  o
#1   1  1    a 101  6    a  1 NA <NA> NA
#2  10 10    j 110 15    j 10 10    s  9
#3   2  2    b 102  7    b  2  2    k  1
#4   3  3    c 103  8    c  3  3    l  2
#5   4  4    d 104  9    d  4  4    m  3
#6   5  5    e 105 10    e  5  5    n  4
#7   6  6    f 106 11    f  6  6    o  5
#8   7  7    g 107 12    g  7  7    p  6
#9   8  8    h 108 13    h  8  8    q  7
#10  9  9    i 109 14    i  9  9    r  8
#11 11 NA <NA>  NA NA <NA> NA 11    t 10
like image 26
Sotos Avatar answered Oct 27 '22 17:10

Sotos