I currently have access to an Apache Hive database via the beeline
CLI. We are still negotiating with IT to get R
on the server. Until that time, I would like to (ab)use the R
dbplyr
package to generate SQL queries on another machine, copy them over, and run them as raw SQL. I have used sql_render
in dbplyr
in the past in instances where I had a valid database connection, but I do not know how to do this without a valid database connection. The ideal case, for me would be something like:
con <- dummy_connection('hive') # this does not exist, I think
qry <- tbl(con,'mytable') %>% # complex logic to build a query
select(var1,var2) %>%
filter(var1 > 0) # etc...
sql_render(qry) %>% # cat it to a file to be used on another machine.
as.character() %>%
cat()
Is there a way to make this 'dummy' connection? And can it be done in such a way that I can specify the variant of SQL?
You can generate an in-memory SQLite database using just R:
library(DBI)
library(odbc)
library(RSQLite)
library(tidyverse)
library(dbplyr)
con <- dbConnect(RSQLite::SQLite(), ":memory:")
data("diamonds")
dbWriteTable(con, "diamonds", diamonds)
With an in-memory SQL database & db connection, you should be able to (ab)use dbplyr
connection to the database to get R to write SQL for you.
This is only SQLite, rather than Hive. But hopefully it is still an accelerator to go from R to SQLite to Hive (or your preferred SQL version).
Also see the following links:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With