I managed to create parallel connections in R to a TSQL server using the below code:
SQL_retrieve <- function(x){
con <- odbcDriverConnect(
'driver={SQL Server};server=OPTMSLMSOFT02;database=Ad_History;trusted_connection=true')
odbcGetInfo(con)
rawData <- sqlQuery(con,
paste("select * from AD_MDL_R_INPUT a where a.itm_lctn_num = ",
facility[x] )) odbcClose(con) return(rawData) }
cl <- makeCluster(5) registerDoParallel(cl)
outputPar <- foreach(j = 1:facility_count, .packages="RODBC")
%dopar% SQL_retrieve(j) stopCluster(cl)
I would expect to see all connections actively downloading in parallel, but the reality is that only one or two connections are active at a time (see image below).
Even with 32 connections, the total download time is cut by slightly more than 1/2 (should be closer to 1/32, in theory, right?). There are also large pauses between connection activity. Why is this?
Connection Utilization
Some notes to keep in mind:
UPDATE 7/26/17 Taking another stab at this problem and it now works (code unchanged). Not sure what happened between now and initial posting, but perhaps some changes to MS SQL server settings (unlikely).
The time to pull 7.9 million rows follows the curve in the image below.
Introduction to Parallel Execution. When Oracle runs SQL statements in parallel, multiple processes work together simultaneously to run a single SQL statement. By dividing the work necessary to run a statement among multiple processes, Oracle can run the statement more quickly than if only a single process ran it.
The queries run in parallel, as far as possible. The database uses different locks for read and write, on rows, blocks or whole tables, depending on what you do. If one query only reads from a table, another query can also read from the same table at the same time.
A parallel plan is any plan for which the Optimizer has chosen to split an applicable operator into multiple threads that are run in parallel.
No, each query will require its own session. To execute in parallel, each query must be conducted in its own session.
SQL Server uses "Connection Pooling."
A connection takes a lot of time to establish from scratch.
An applications will make repeated identical connections, so pooling increases performance. SQL half-closes connections, so the next connection will start with a leg up and be much quicker.
You don't want to use pooling in your instance. You can turn off pooling by adding "pooling=false;" as mentioned above by @rene-lykke-dahl. That should resolve your issue.
Read about connection pooling here:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With