Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to manage a database connection in an R Package

I'm building an R package, the main purpose of which is to abstract away the pain of dealing with a proprietary database that requires some fairly complex SQL queries in order get data out.

As such, the connection to the Microsoft SQL Server (obtained by odbcDriverConnect) is a constant and important part of this package, but I can't work out how best to manage this and I'm hoping for advice as to how this should be implemented in R.

My current thoughts are:

  1. Make the user ensure they have a valid connection before they call any function. Each function then has connection as a parameter which must be passed. This puts a burden on the user.

  2. In every function, make a call to get.connection() which will get new connection each time. Old connections are then allowed to timeout naturally, which seems a sloppy approach.

  3. As above, but return the same connection each time. This appears not to be a viable proposition as I can't prevent connections from timing out through R. autoReconnect=TRUE and other tricks I've used in different languages seem to have no effect.

In Java, I would probably have a DatabaseConnectionPool populated with a number of connections and simply grab connections from, and return them to, that pool as needed. I also don't seem to have the timeout issue in Java when I specify autoReconnect=TRUE.

Any suggestions much appreciated.

like image 974
Ina Avatar asked Feb 08 '13 19:02

Ina


1 Answers

pool is an R package for pooling connections such as databases. If you're cool to use a github package, take a look at https://github.com/rstudio/pool. Will reuse or recreate the connection as required.

like image 154
dsz Avatar answered Nov 11 '22 11:11

dsz