I would like to understand what is the best practice for (re)using SQL connections to a MS SQL database through RJDBC. I can imagine three possible scenarios: <ol> <li>Store a connection in a global variable, initialize once, use it everywhere in the code</li> <li>Create a new connection for every request</li> <li>Do something more complicated, e.g. pre-populate a pool of open connection, and (re)use connections from the pool as needed.</li> </ol> I'm using my code in a shiny application with several dozens clients, and I'm afraid that something bad will happen if I use method 1. So I use method 2, creating a new connection for every request with the code below. I can see some potential downsides of this approach: performance, taxing database resources, etc. But may be I'm too cautious since R is single-threaded, even in shiny usage scenario? So my specific questions are: A. Can I safely use a single connection to MS SQL database through RJDBC throughout my shiny application? B. Are there any real downsides (memory leakage, performance, etc.) in scenario 2 above? <hr> <pre class="prettyprint"><code>NewConnection <- function() { file = NULL # make it work on three different OSes - Linux, MacOS, Windows for (path in c('/Users/victor/Documents/R/sqljdbc_3.0/enu/sqljdbc4.jar', '/home/oracle/sqljdbc_3.0/enu/sqljdbc4.jar', 'C:/Projects/jdbc/sqljdbc_4.0/enu/sqljdbc4.jar')) { if (file.exists(path)) { file = path break } } if (is.null(file)) return(NULL) else { drv <- JDBC("com.microsoft.sqlserver.jdbc.SQLServerDriver", file) passwd <- GetUserNamePassword() conn <- dbConnect(drv, "jdbc:sqlserver://sql.server.address.com", passwd$username, passwd$password) return(conn) } } </code></pre> <hr> P.S. Related: How to manage a database connection in an R Package

It might help to consider what happens behind the scenes every time you establish a connection: <ul> <li>A TCP/IP connection has to be established (including DNS lookup and contacting the SQL Server Browser to get the correct port number for a named instance)</li> <li>The user needs to be authenticated and verified to be authorized to connect</li> <li>Server side resources for the connection (private memory etc.) have to be reserved</li> </ul> Therefore it makes sense to limit the amount of connections used by your application. If your application executes all transactions in sequence you should open the connection once and reuse it. Use a connection pool for a server-based multi-user application.

Are there performance/other downsides in creating a new RJDBC connections to MS SQL database for each request?

Tags:

sql

sql-server

r

shiny

rjdbc

I would like to understand what is the best practice for (re)using SQL connections to a MS SQL database through RJDBC.

I can imagine three possible scenarios:

Store a connection in a global variable, initialize once, use it everywhere in the code
Create a new connection for every request
Do something more complicated, e.g. pre-populate a pool of open connection, and (re)use connections from the pool as needed.

I'm using my code in a shiny application with several dozens clients, and I'm afraid that something bad will happen if I use method 1. So I use method 2, creating a new connection for every request with the code below.

I can see some potential downsides of this approach: performance, taxing database resources, etc. But may be I'm too cautious since R is single-threaded, even in shiny usage scenario?

So my specific questions are:

A. Can I safely use a single connection to MS SQL database through RJDBC throughout my shiny application?

B. Are there any real downsides (memory leakage, performance, etc.) in scenario 2 above?

NewConnection <- function() {
  file = NULL
    # make it work on three different OSes - Linux, MacOS, Windows 
    for (path in c('/Users/victor/Documents/R/sqljdbc_3.0/enu/sqljdbc4.jar',
          '/home/oracle/sqljdbc_3.0/enu/sqljdbc4.jar',
          'C:/Projects/jdbc/sqljdbc_4.0/enu/sqljdbc4.jar')) {
      if (file.exists(path)) {
        file = path
          break
      }
    }
  if (is.null(file))
    return(NULL)
  else {
    drv <- JDBC("com.microsoft.sqlserver.jdbc.SQLServerDriver", file)
      passwd <- GetUserNamePassword()
      conn <- dbConnect(drv, "jdbc:sqlserver://sql.server.address.com", 
          passwd$username, passwd$password)
      return(conn)
  }
}

P.S. Related: How to manage a database connection in an R Package

335

asked Aug 31 '13 21:08

Victor K.

2 Answers

Many Questions:

1) Reusing a connection is faster then establishing a new connection for every use. Depending on your code, this will speedup your application a little bit. But reusing connections is more complex. Thats the reason why many people use connection pools.

2) If your program has a short runtime you can work with one connection, e.g. in a global variable. If your application is a server application (long running), than you need to maintain your connection because the server can close the connection, if he thing that nobody use it because there runs no traffic over the connection. This could happen in the night times on server applications. The connection maintenance function is part of connection pools.

Summary. If your application a simple, not multi threaded, not server application, than reuse your single connection. Otherwise, use every time a new connection or use a connection pool.

answered Oct 23 '22 09:10

Mirko Ebert

It might help to consider what happens behind the scenes every time you establish a connection:

A TCP/IP connection has to be established (including DNS lookup and contacting the SQL Server Browser to get the correct port number for a named instance)
The user needs to be authenticated and verified to be authorized to connect
Server side resources for the connection (private memory etc.) have to be reserved

Therefore it makes sense to limit the amount of connections used by your application.

If your application executes all transactions in sequence you should open the connection once and reuse it. Use a connection pool for a server-based multi-user application.

answered Oct 23 '22 09:10

Twinkles

Related questions
                            
                                SQL query - Select * from view or Select col1, col2, ... colN from view [duplicate]
                            
                                Why would a sql query have "where 1 = 1" [duplicate]
                            
                                How to use SELECT IN clause in JDBCTemplates?
                            
                                Displaying all table names in php from MySQL database
                            
                                Create View with option (maxrecursion)
                            
                                Generate db diagram from any sql statement
                            
                                Allowing a stored procedure to select from a system table using a certificate in SQL Server 2012
                            
                                Caching in JDBC [closed]
                            
                                NewSQL versus traditional optimization/sharding [closed]
                            
                                Entity Framework 6.1 Code First Cascading Delete with TPH for one-to-one relationship on a derived type
                            
                                SQL add columns of each record together
                            
                                Why query optimizer selects completely different query plans?
                            
                                Can someone explain why these two linq queries return different results?
                            
                                NHibernate inconsistent sql column alias
                            
                                Slow performance when using OFFSET/FETCH with Fulltext in SQL Server 2012

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With