Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Best Practices - dplyr and odbc multi table actions (retrieved from SQL)

Say you have your tables stores in an SQL server DB, and you want to perform multi table actions, i.e. join several tables from that same database.

Following code can interact and receive data from SQL server:

library(dplyr)
library(odbc)
con <- dbConnect(odbc::odbc(),
                 .connection_string = "Driver={SQL Server};Server=.;Database=My_DB;")

Table1 <- tbl(con, "Table1")
Table1 # View glimpse of Table1

Table2 <- tbl(con, "Table2")
Table2 # View glimpse of Table2

Table3 <- tbl(con, "Table3")

However, with a few results retrieved with the same connection, eventually following error occurs:

Error: [Microsoft][ODBC SQL Server Driver]Connection is busy with results for another hstmt

My current googling skills have taking me to the answer that the backend does not support multiple active result sets (MARS) - I guess more than 2 active result sets is the maximum? (backend is DBI and odbc)

So, my question is: what is best practice if I want to collect data from several tables from an SQL DB?

  • Open a connection for each table?

  • Actively close the connection and open it again for the next table?

  • Does the backend support MARS to be parsed to the connection string?

like image 749
Mathias Pagh Avatar asked Jan 24 '18 07:01

Mathias Pagh


1 Answers

To make a connection that can hold multiple result sets, I've had luck with following connection code:

con <- DBI::dbConnect(odbc::odbc(),
                      Driver = "SQL Server Native Client 11.0",
                      Server = "my_host",
                      UID = rstudioapi::askForPassword("Database UID"),
                      PWD = rstudioapi::askForPassword("Database PWD"),
                      Port = 1433,
                      MultipleActiveResultSets = "True",
                      Database = my_db)

On top of that, I found that the new pool-package can do the job:

pool <- dbPool(odbc::odbc(),
                      Driver = "SQL Server Native Client 11.0",
                      Server = "my_host",
                      UID = rstudioapi::askForPassword("Database UID"),
                      PWD = rstudioapi::askForPassword("Database PWD"),
                      Port = 1433,
                      MultipleActiveResultSets = "True",
                      Database = my_db)

It is quicker and more stable than the DBI connection, however, one minor drawback is that the database doesn't pop up in the connection tab for easy reference.

For both methods, remember to close the connection/pool when done. For the DBI-method its:

dbDisconnect(con)

Whereas the pool-method is closed by calling:

poolClose(pool)
like image 93
Mathias Pagh Avatar answered Oct 19 '22 20:10

Mathias Pagh