Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speed up odbc::dbFetch

I'm trying to analyze data stored in an SQL database (MS SQL server) in R, and on a mac. Typical queries might return a few GB of data, and the entire database is a few TB. So far, I've been using the R package odbc, and it seems to work pretty well.

However, dbFetch() seems really slow. For example, a somewhat complex query returns all results in ~6 minutes in SQL server, but if I run it with odbc and then try dbFetch, it takes close to an hour to get the full 4 GB into a data.frame. I've tried fetching in chunks, which helps modestly: https://stackoverflow.com/a/59220710/8400969. I'm wondering if there is another way to more quickly pipe the data to my mac, and I like the line of thinking here: Quickly reading very large tables as dataframes

What are some strategies for speeding up dbFetch when the results of queries are a few GB of data? If the issue is generating a data.frame object from larger tables, are there savings available by "fetching" in a different manner? Are there other packages that might help?

Thanks for your ideas and suggestions!

like image 612
Michael Avatar asked Sep 11 '25 11:09

Michael


1 Answers

I would suggest using the dbcooper found on github. https://github.com/chriscardillo/dbcooper

I have found huge improvements in speed when querying large datasets.

Firstly, Add your connection to your environment.

conn <- DBI::dbConnect(odbc::odbc(),
                   Driver = "",
                   Server = "",
                   Database = "",
                   UID="",
                   PWD="")

devtools::install_github("chriscardillo/dbcooper")
library(dbcooper)
dbcooper::dbc_init(con = conn, 
               con_id = "test", 
               tables = c("schema.table"))

This adds the function test_schema_table() to your environment which is used to call the data. To collect into your environment use scheme_table %>% collect()

Here is a microbenchmark I did to compare the results of both DBI and dbcooper.

mbm <- microbenchmark::microbenchmark(
  DBI = DBI::dbFetch(DBI::dbSendQuery(conn,qry)),
  dbcooper = ava_qry() %>% collect() ,  times=5
)

Here are the results of a microbenchmark I did to compare DBI with dbcooper.

Microbenchmark of DBI vs dbcooper

enter image description here

like image 55
RStam Avatar answered Sep 14 '25 01:09

RStam