Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generate CROSS JOIN queries with dbplyr

Tags:

sql

r

dplyr

dbplyr

Given 2 remote tables (simulated with tbl_lazy for this example)

library("dplyr")
library("dbplyr")

t1 <- tbl_lazy(df = iris, src = dbplyr::simulate_mysql())
t2 <- tbl_lazy(df = mtcars, src = dbplyr::simulate_mysql())

How can I perform an actual* cross join between t1 and t2 using R and dbplyr?

* i.e. using CROSS JOIN in the translated SQL query

Note that I know how to perform all the other types of joins, this is precisely about CROSS joins.

I am aware of the following trick:

joined <- t1 %>%
  mutate(tmp = 1) %>%
  full_join(mutate(t2, tmp = 1), by = "tmp") %>%
  select(-tmp)

However

  1. This is ugly (even if it could be hidden in a function)
  2. I would like to take advantage of the highly optimised join capabilities of the DB, so I'd like to pass a real SQL CROSS JOIN. Using show_query(joined) shows that the generated SQL query uses LEFT JOIN.

Sadly, there is no cross_join operator in dplyr and sql_join(t1, t2, type = "cross") does not work either (not implemented for tbls, works only on DB connections).

How can I generate an SQL CROSS JOIN with dbplyr?

like image 788
asachet Avatar asked Jul 16 '19 16:07

asachet


1 Answers

According to the dbplyr NEWS file, since version 1.10, if you use a full_join(..., by = character()), it will "promote" the join to a cross join. This doesn't seem to be documented anywhere else yet, but searching the dbplyr Github repo for "cross" turned it up in both code and the NEWS file.

This syntax does not seem to work for local data frames, only via SQL.

like image 153
Gregor Thomas Avatar answered Sep 30 '22 13:09

Gregor Thomas