Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding parallel TSQL connections

I managed to create parallel connections in R to a TSQL server using the below code:

SQL_retrieve <- function(x){
con <- odbcDriverConnect(
'driver={SQL Server};server=OPTMSLMSOFT02;database=Ad_History;trusted_connection=true')

odbcGetInfo(con)
rawData <- sqlQuery(con,
paste("select * from AD_MDL_R_INPUT a where a.itm_lctn_num = ",
facility[x] )) odbcClose(con) return(rawData) }

cl <- makeCluster(5) registerDoParallel(cl)
outputPar <- foreach(j = 1:facility_count, .packages="RODBC")
%dopar% SQL_retrieve(j) stopCluster(cl)

I would expect to see all connections actively downloading in parallel, but the reality is that only one or two connections are active at a time (see image below).
Even with 32 connections, the total download time is cut by slightly more than 1/2 (should be closer to 1/32, in theory, right?). There are also large pauses between connection activity. Why is this?

Connection Utilization

Some notes to keep in mind:

  • The TSQL server and R are both on the same server, so network latency not an issue.
  • The TSQL server allows up to a max of ~32k connections, so we are not bumping into a session limit issue.

UPDATE 7/26/17 Taking another stab at this problem and it now works (code unchanged). Not sure what happened between now and initial posting, but perhaps some changes to MS SQL server settings (unlikely).

The time to pull 7.9 million rows follows the curve in the image below.

Time versus SQL Connections

like image 302
Evan Larson Avatar asked Aug 17 '16 14:08

Evan Larson


People also ask

How does parallel work in SQL?

Introduction to Parallel Execution. When Oracle runs SQL statements in parallel, multiple processes work together simultaneously to run a single SQL statement. By dividing the work necessary to run a statement among multiple processes, Oracle can run the statement more quickly than if only a single process ran it.

Can SQL queries run in parallel?

The queries run in parallel, as far as possible. The database uses different locks for read and write, on rows, blocks or whole tables, depending on what you do. If one query only reads from a table, another query can also read from the same table at the same time.

What is parallel plan in SQL Server?

A parallel plan is any plan for which the Optimizer has chosen to split an applicable operator into multiple threads that are run in parallel.

Can we execute queries parallel from different session?

No, each query will require its own session. To execute in parallel, each query must be conducted in its own session.


1 Answers

SQL Server uses "Connection Pooling."

A connection takes a lot of time to establish from scratch.

An applications will make repeated identical connections, so pooling increases performance. SQL half-closes connections, so the next connection will start with a leg up and be much quicker.

You don't want to use pooling in your instance. You can turn off pooling by adding "pooling=false;" as mentioned above by @rene-lykke-dahl. That should resolve your issue.

Read about connection pooling here:

like image 82
Jason Geiger Avatar answered Oct 07 '22 02:10

Jason Geiger