I want to left_join
multiple data frames:
dfs <- list( df1 = data.frame(a = 1:3, b = c("a", "b", "c")), df2 = data.frame(c = 4:6, b = c("a", "c", "d")), df3 = data.frame(d = 7:9, b = c("b", "c", "e")) ) Reduce(left_join, dfs) # a b c d # 1 1 a 4 NA # 2 2 b NA 7 # 3 3 c 5 8
This works because they all have the same b
column, but Reduce
doesn't let me specify additional arguments that I can pass to left_join
. Is there a work around for something like this?
dfs <- list( df1 = data.frame(a = 1:3, b = c("a", "b", "c")), df2 = data.frame(c = 4:6, d = c("a", "c", "d")), df3 = data.frame(d = 7:9, b = c("b", "c", "e")) )
Update
This kind of works: Reduce(function(...) left_join(..., by = c("b" = "d")), dfs)
but when by
is more than one element it gives this error: Error: cannot join on columns 'b' x 'd': index out of bounds
To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order. If data frameA has variables that data frameB does not, then either: Delete the extra variables in data frameA or.
We can merge two data frames in R by using the merge() function or by using family of join() function in dplyr package. The data frames must have same column names on which the merging happens. Merge() Function in R is similar to database join operation in SQL.
To join by different variables on x and y , use a named vector. For example, by = c("a" = "b") will match x$a to y$b . To join by multiple variables, use a vector with length > 1. For example, by = c("a", "b") will match x$a to y$a and x$b to y$b .
To combine two data frames with same columns in R language, call rbind() function, and pass the two data frames, as arguments. rbind() function returns the resulting data frame created from concatenating the given two data frames. For rbind() function to combine the given data frames, the column names must match.
It's been too late i know....today I got introduced to the unanswered questions section. Sorry to bother.
Using left_join()
dfs <- list( df1 = data.frame(b = c("a", "b", "c"), a = 1:3), df2 = data.frame(d = c("a", "c", "d"), c = 4:6), df3 = data.frame(b = c("b", "c", "e"), d = 7:9) ) func <- function(...){ df1 = list(...)[[1]] df2 = list(...)[[2]] col1 = colnames(df1)[1] col2 = colnames(df2)[1] xxx = left_join(..., by = setNames(col2,col1)) return(xxx) } Reduce( func, dfs) # b a c d #1 a 1 4 NA #2 b 2 NA 7 #3 c 3 5 8
Using merge()
:
func <- function(...){ df1 = list(...)[[1]] df2 = list(...)[[2]] col1 = colnames(df1)[1] col2 = colnames(df2)[1] xxx=merge(..., by.x = col1, by.y = col2, , all.x = T) return(xxx) } Reduce( func, dfs) # b a c d #1 a 1 4 NA #2 b 2 NA 7 #3 c 3 5 8
Would this work for you?
jnd.tbl <- df1 %>% left_join(df2, by='b') %>% left_join(df3, by='d')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With