Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use dplyr with database without creating an explicit DBI object

Most code examples showing how to use dplyr with a database involve creating a database connection object:

connStr <- "driver=driver;server=hostname;database=mydatabase;..."
db <- DBI::dbConnect(odbc::odbc(), .connection_string=connStr)

tbl <- tbl(db, "mytable")
tbl %>% verb1 %>% verb2 %>% ...

However, suppose I omit creating a db object:

tbl <- tbl(DBI::dbConnect(odbc::odbc(), .connection_string=connStr), "mytable")
tbl %>% verb1 %>% verb2 %>% ...

Are there any consequences to this? Will I use up database resources/leak memory/etc?

The DBMS I have in mind is SQL Server, and the driver package is odbc, in case it matters.

like image 518
Hong Ooi Avatar asked Oct 30 '22 02:10

Hong Ooi


1 Answers

The new DBI specs assume that the caller frees all connection they allocate with dbConnect() with a corresponding call to dbDisconnect(). Failure to do so will close the connection only during garbage collection (or the end of the R session), thus delaying liberation of resources, or even leak the connection.

The exact behavior depends on the DBI backend involved (in this case the odbc package). According to Jim Hester, the maintainer of odbc,

[it] automatically calls dbDisconnect() when the connection object is garbage collected, so this won't leak connections. If you are opening a large number of connections it is always best to be explicit, if you are just doing this interactively it is probably ok to rely on the garbage collector in this case.

like image 73
krlmlr Avatar answered Nov 15 '22 07:11

krlmlr