Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I append columns from a list of data frames?

Tags:

dataframe

r

Suppose I have the following list of dataframes:

lst <- list(
a=data.frame(key=c(1,2,3), val=c(10,20,30)), 
b=data.frame(key=c(1,2,3), val=c(100,200,300)), 
c=data.frame(key=c(1,2,3), val=c(1000,2000,3000)), 
d=data.frame(key=c(1,2), val=c(10000,20000)))

How do I create a data.frame as follows? (or similar):

  key val.a val.b val.c val.d
1   1    10   100  1000 10000
2   2    20   200  2000 20000
3   3    30   300  3000 NA

I tried to do it this way (but FAILED):

Browse[2]> Reduce(function(x,y) merge(x, y, by = 'key', all.x = T), lst)
  key val.x val.y val.x val.y
1   1    10   100  1000 10000
2   2    20   200  2000 20000
3   3    30   300  3000 NA
Warning message:
In merge.data.frame(x, y, by = "key", all.x = T) :
  column names ‘val.x’, ‘val.y’ are duplicated in the result

NOTE: I would prefer a base-R solution but am interested in other ways of doing this

like image 375
Denis Avatar asked Jan 25 '23 03:01

Denis


2 Answers

We can use inner_join with reduce after renaming the second column with the corresponding list name

library(purrr)
library(dplyr)
library(stringr)
imap(lst, ~ { nm <- .y
              .x %>% 
                rename_at(vars(val), ~ str_c(., ".", nm))}) %>% 
    reduce(full_join, by = 'key')
#    key val.a val.b val.c val.d
#1   1    10   100  1000 10000
#2   2    20   200  2000 20000
#3   3    30   300  3000 30000

Or in base R, we use Map to do the renaming of the column, then as in the OP's post, do the merge within Reduce

Reduce(function(...) merge(..., by = 'key', all = TRUE), 
       Map(function(x, y) setNames(x, c('key',
           paste0(names(x)[-1], ".", y))), lst, names(lst)))
#  key val.a val.b val.c val.d
#1   1    10   100  1000 10000
#2   2    20   200  2000 20000
#3   3    30   300  3000 30000
like image 171
akrun Avatar answered Jan 29 '23 20:01

akrun


If you are not too fussed about the column names etc looking exactly they way you have it, you can use the following:

df_merged = do.call(cbind,lst) %>% 
  select(key = a.key, 2,4,6,8) 

with output:

   key a.val b.val c.val d.val
1     1    10   100  1000 10000
2     2    20   200  2000 20000
3     3    30   300  3000 30000
like image 22
Luca Pontiggia Avatar answered Jan 29 '23 20:01

Luca Pontiggia