Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to join multiple data frames using dplyr?

Tags:

I want to left_join multiple data frames:

dfs <- list(   df1 = data.frame(a = 1:3, b = c("a", "b", "c")),   df2 = data.frame(c = 4:6, b = c("a", "c", "d")),   df3 = data.frame(d = 7:9, b = c("b", "c", "e")) ) Reduce(left_join, dfs) #   a b  c  d # 1 1 a  4 NA # 2 2 b NA  7 # 3 3 c  5  8 

This works because they all have the same b column, but Reduce doesn't let me specify additional arguments that I can pass to left_join. Is there a work around for something like this?

dfs <- list(   df1 = data.frame(a = 1:3, b = c("a", "b", "c")),   df2 = data.frame(c = 4:6, d = c("a", "c", "d")),   df3 = data.frame(d = 7:9, b = c("b", "c", "e")) ) 

Update

This kind of works: Reduce(function(...) left_join(..., by = c("b" = "d")), dfs) but when by is more than one element it gives this error: Error: cannot join on columns 'b' x 'd': index out of bounds

like image 857
nachocab Avatar asked Dec 17 '15 21:12

nachocab


People also ask

How do I join multiple data frames in R?

To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order. If data frameA has variables that data frameB does not, then either: Delete the extra variables in data frameA or.

How do I merge data frames with dplyr?

We can merge two data frames in R by using the merge() function or by using family of join() function in dplyr package. The data frames must have same column names on which the merging happens. Merge() Function in R is similar to database join operation in SQL.

How do I join tables in dplyr?

To join by different variables on x and y , use a named vector. For example, by = c("a" = "b") will match x$a to y$b . To join by multiple variables, use a vector with length > 1. For example, by = c("a", "b") will match x$a to y$a and x$b to y$b .

How do I merge two data frames in the same column in R?

To combine two data frames with same columns in R language, call rbind() function, and pass the two data frames, as arguments. rbind() function returns the resulting data frame created from concatenating the given two data frames. For rbind() function to combine the given data frames, the column names must match.


2 Answers

It's been too late i know....today I got introduced to the unanswered questions section. Sorry to bother.

Using left_join()

dfs <- list(               df1 = data.frame(b = c("a", "b", "c"), a = 1:3),               df2 = data.frame(d = c("a", "c", "d"), c = 4:6),               df3 = data.frame(b = c("b", "c", "e"), d = 7:9)          )  func <- function(...){   df1 = list(...)[[1]]   df2 = list(...)[[2]]   col1 = colnames(df1)[1]   col2 = colnames(df2)[1]   xxx = left_join(..., by = setNames(col2,col1))   return(xxx) } Reduce( func, dfs) #  b a  c  d #1 a 1  4 NA #2 b 2 NA  7 #3 c 3  5  8 

Using merge() :

func <- function(...){   df1 = list(...)[[1]]   df2 = list(...)[[2]]   col1 = colnames(df1)[1]   col2 = colnames(df2)[1]   xxx=merge(..., by.x = col1, by.y = col2, , all.x = T)   return(xxx) }  Reduce( func, dfs) #  b a  c  d #1 a 1  4 NA #2 b 2 NA  7 #3 c 3  5  8 
like image 84
joel.wilson Avatar answered Oct 28 '22 11:10

joel.wilson


Would this work for you?

jnd.tbl <- df1 %>%     left_join(df2, by='b') %>%     left_join(df3, by='d') 
like image 31
elesk01s Avatar answered Oct 28 '22 11:10

elesk01s