Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

write table in database with dplyr

Is there a way to make dplyr hooked up to a database pipe data to a new table within that database, never downloading the data locally?

I'd like to do something along the lines of:

tbl(con, "mytable") %>%
   group_by(dt) %>%
   tally() %>%
   write_to(name = "mytable_2", schema = "transformed")
like image 537
jenswirf Avatar asked Apr 26 '15 13:04

jenswirf


People also ask

Is Dplyr like SQL?

dplyr is a R package that provides a set of grammar based functions to transform data. Compared to using SQL, it's much easier to construct and much easier to read what's constructed.

How do I query a database in R?

You can query your data with DBI by using the dbGetQuery() function. Simply paste your SQL code into the R function as a quoted string. This method is sometimes referred to as pass through SQL code, and is probably the simplest way to query your data. Care should be used to escape your quotes as needed.

Can I create a database in R?

We can use R to create a new database and associated structure (also known as the schema) databases from existing csv files.

Which command is used to connect R database?

R requires RMySQL package to create a connection object which takes username, password, hostname and database name while calling the function. dbConnect() function is used to create the connection object in R.


1 Answers

While I whole heartedly agree with the suggestion to learn SQL, you can take advantage of the fact that dplyr doesn't pull data until it absolutely has to and build the query using dplyr, add the TO TABLE clause, and then run the SQL statement using dplyr::do(), as in:

# CREATE A DATABASE WITH A 'FLIGHTS' TABLE
library(RSQLite)
library(dplyr)
library(nycflights13)
my_db <- src_sqlite("~/my_db.sqlite3", create = T)
flights_sqlite <- copy_to(my_db, flights, temporary = FALSE, indexes = list(
  c("year", "month", "day"), "carrier", "tailnum"))

# BUILD A QUERY
QUERY = filter(flights_sqlite, year == 2013, month == 1, day == 1) %>%
    select( year, month, day, carrier, dep_delay, air_time, distance) %>%
    mutate( speed = distance / air_time * 60) %>%
    arrange( year, month, day, carrier)

# ADD THE "TO TABLE" CLAUSE AND EXECUTE THE QUERY 
do(paste(unclass(QUERY$query$sql), "TO TABLE foo"))

You could even write a little functoin that does this:

to_table  <- function(qry,tbl)
    dplyr::do(paste(unclass(qry$query$sql), "TO TABLE",tbl))

and pipe the query into that function like so:

filter(flights_sqlite, year == 2013, month == 1, day == 1) %>%
    select( year, month, day, carrier, dep_delay, air_time, distance) %>%
    mutate( speed = distance / air_time * 60) %>%
    arrange( year, month, day, carrier) %>%
    to_table('foo')
like image 79
Jthorpe Avatar answered Oct 16 '22 23:10

Jthorpe