Is there a way to make dplyr hooked up to a database pipe data to a new table within that database, never downloading the data locally?
I'd like to do something along the lines of:
tbl(con, "mytable") %>%
group_by(dt) %>%
tally() %>%
write_to(name = "mytable_2", schema = "transformed")
dplyr is a R package that provides a set of grammar based functions to transform data. Compared to using SQL, it's much easier to construct and much easier to read what's constructed.
You can query your data with DBI by using the dbGetQuery() function. Simply paste your SQL code into the R function as a quoted string. This method is sometimes referred to as pass through SQL code, and is probably the simplest way to query your data. Care should be used to escape your quotes as needed.
We can use R to create a new database and associated structure (also known as the schema) databases from existing csv files.
R requires RMySQL package to create a connection object which takes username, password, hostname and database name while calling the function. dbConnect() function is used to create the connection object in R.
While I whole heartedly agree with the suggestion to learn SQL, you can take advantage of the fact that dplyr
doesn't pull data until it absolutely has to and build the query using dplyr
, add the TO TABLE
clause, and then run the SQL statement using dplyr::do()
, as in:
# CREATE A DATABASE WITH A 'FLIGHTS' TABLE
library(RSQLite)
library(dplyr)
library(nycflights13)
my_db <- src_sqlite("~/my_db.sqlite3", create = T)
flights_sqlite <- copy_to(my_db, flights, temporary = FALSE, indexes = list(
c("year", "month", "day"), "carrier", "tailnum"))
# BUILD A QUERY
QUERY = filter(flights_sqlite, year == 2013, month == 1, day == 1) %>%
select( year, month, day, carrier, dep_delay, air_time, distance) %>%
mutate( speed = distance / air_time * 60) %>%
arrange( year, month, day, carrier)
# ADD THE "TO TABLE" CLAUSE AND EXECUTE THE QUERY
do(paste(unclass(QUERY$query$sql), "TO TABLE foo"))
You could even write a little functoin that does this:
to_table <- function(qry,tbl)
dplyr::do(paste(unclass(qry$query$sql), "TO TABLE",tbl))
and pipe the query into that function like so:
filter(flights_sqlite, year == 2013, month == 1, day == 1) %>%
select( year, month, day, carrier, dep_delay, air_time, distance) %>%
mutate( speed = distance / air_time * 60) %>%
arrange( year, month, day, carrier) %>%
to_table('foo')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With