Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to generate SQL from dbplyr without a database connection?

I currently have access to an Apache Hive database via the beeline CLI. We are still negotiating with IT to get R on the server. Until that time, I would like to (ab)use the R dbplyr package to generate SQL queries on another machine, copy them over, and run them as raw SQL. I have used sql_render in dbplyr in the past in instances where I had a valid database connection, but I do not know how to do this without a valid database connection. The ideal case, for me would be something like:

con <- dummy_connection('hive')   # this does not exist, I think
qry <- tbl(con,'mytable') %>%     # complex logic to build a query
  select(var1,var2) %>%
  filter(var1 > 0)   # etc...
sql_render(qry) %>%               # cat it to a file to be used on another machine.
  as.character() %>%
  cat() 

Is there a way to make this 'dummy' connection? And can it be done in such a way that I can specify the variant of SQL?

like image 463
steveo'america Avatar asked Mar 02 '18 22:03

steveo'america


1 Answers

You can generate an in-memory SQLite database using just R:

library(DBI)
library(odbc)
library(RSQLite)
library(tidyverse)
library(dbplyr)

con <- dbConnect(RSQLite::SQLite(), ":memory:")

data("diamonds")

dbWriteTable(con, "diamonds", diamonds)

With an in-memory SQL database & db connection, you should be able to (ab)use dbplyr connection to the database to get R to write SQL for you.

This is only SQLite, rather than Hive. But hopefully it is still an accelerator to go from R to SQLite to Hive (or your preferred SQL version).

Also see the following links:

  • SQLite vingette
  • Bradley's demo (source of above code)
like image 95
Simon.S.A. Avatar answered Sep 24 '22 04:09

Simon.S.A.