Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

outer join data.table R

Tags:

r

data.table

Just wondering if there is an efficient way to do outer joins with data table such as

a <- data.table(a=c(1,2,3),b=c(3,4,5))
b <- data.table(a=c(1,2),k=c(1,2))
merge(a,b,by="a",all.x=T)

this works fine, but it is not as efficient as the inner join with bigger data, as the following runs very fast, but the above is really slow.

setkey(a,a)
setkey(b,a)
a[b,]
like image 545
jamborta Avatar asked Nov 21 '12 12:11

jamborta


People also ask

How do you make an outer join in R?

To perform outer join or full outer join use either merge() function, dplyr full_join() function, or use reduce() from tidyverse. Using the dplyr function is the best approach as it runs faster than the R base approach. dplyr package provides several functions to join data frames in R.

How do you join data tables in R?

If you want to join by multiple variables, then you need to specify a vector of variable names: by = c("var1", "var2", "var3") . Here all three columns must match in both tables. If you want to use all variables that appear in both tables, then you can leave the by argument blank.

How do you join columns in R?

If the columns you want to join by don't have the same name, you need to tell merge which columns you want to join by: by. x for the x data frame column name, and by. y for the y one, such as merge(df1, df2, by. x = "df1ColName", by.


1 Answers

b[a,] is the "outer join" you're looking for.

Take a look at ?merge.data.table for more specifics.

like image 78
Justin Avatar answered Sep 17 '22 16:09

Justin