SSIS Balanced Data Distributor - Increase number of operations?

Tags:

As per the attached, we have a Balanced Data Distributor set up in a data transformation covering about 2 million rows. The script tasks are identical - each one opens a connection to oracle and executes first a delete and then an insert. (This isn't relevant but it's done that way due to parameter issues with the Ole DB command and the Microsoft Ole DB provider for Oracle...)

enter image description here

The issue I'm running into is no matter how large I make my buffers or how many concurrent executions I configure, the BDD will not execute more than five concurrent processes at a time.

I've pulled back hundreds of thousands of rows in a larger buffer, and it just gets divided 5 ways. I've tried this on multiple machines - the current shot is from a 16 core server with -1 concurrent executions configured on the package - and no matter what, it's always 5 parallel jobs.

5 is better than 1, but with 2.5 million rows to insert/update, 15 rows per second at 5 concurrent executions isn't much better than 2-3 rows per second with 1 concurrent execution.

Can I force the BDD to use more paths, and if so how?

988

asked Oct 15 '13 13:10

The Evil Greebo

2 Answers

Short answer:

Yes BDD can make use of more than five paths. You shouldn't be doing anything special to force it, by definition it should automatically do it for you. Then why isn't it using more than 5 paths? Because your source is producing data faster than your destination can consume causing backpressure. To resolve it, you've to tune your destination components.

Long answer:

In theory, "the BDD takes input data and routes it in equal proportions to it's outputs, however many there are." In your set up, there are 10 outputs. So input data should be equally distributed to all the 10 outputs at the same time and you should see 10 paths executing at the same time - again in theory.

But another concept of BDD is "instead of routing individual rows, the BDD operates on buffers on data." Which means data flow engine initiates a buffer, fills it with as many rows as possible, and moves that buffer to the next component (script destination in your case). As you can see 5 buffers are used each with the same number of rows. If additional buffers were started, you'd have seen more paths being used. SSIS couldn't use additional buffers and ultimately additional paths because of a mechanism called backpressure; it happens when the source produces data faster than the destination can consume it. If it happens all memory would be used up by the source data and SSIS will not have any memory to use for the transformation and destination components. So to avoid it, SSIS limits the number of active buffers. It is set to 5 (can't be changed) which is exactly the number of threads you're seeing.

PS: The text within quotes is from this article

answered Nov 12 '22 01:11

Samuel Vanga

Another interesting thing I've discovered via this article on CodeProject.

[T]his component uses an internal buffer of 9,947 rows (as per the experiment, I found so) and it is pre-set. There is no way to override this. As a proof, instead of 10 lac rows, we will use only 9,947 (Nine thousand nine forty seven ) rows in our input file and will observe the behavior. After running the package, we will find that all the rows are being transferred to the first output component and the other components received nothing.

Now let us increase the number of rows in our input file from 9,947 to 9,948 (Nine thousand nine forty eight). After running the package, we find that the first output component received 9,947 rows while the second output component received 1 row.

So I notice in your first buffer run that you pulled 50,000 records. Those got divided into 9,984 record buckets and passed to each output. So essentially the BDD takes the records it gets from the buffer and passes them out in ~10,000 record increments to each output. So in this case perhaps your source is the bottleneck.

Perhaps you'll need to split your original Source query in half and create two BDD-driven data flows to in essence double your parallel throughput.

answered Nov 12 '22 02:11

Kyle Hale

Related questions
                            
                                DTS Script Task Runtime Error: Exception has been thrown by the target of an invocation
                            
                                Deploying SSIS Package - 'Failed to load Assembly Microsoft.SqlServer.Management.IntegrationServicesEnum'
                            
                                Conditional SSIS - execute one task or another based on the result
                            
                                Export Flat File based on the each SQL statement in the table and destination
                            
                                Running SSIS packages in separate memory allocations or increasing the default buffer size?
                            
                                EPPlus 2.9.0.1 throws System.IO.IsolatedStorage.IsolatedStorageException when trying to save a file bigger than ~1.5 MiB from a SSIS package
                            
                                TFS CI issue with a SSIS package
                            
                                How can I deploy a single package from SQL Server Data Tools in SSIS 2012?
                            
                                SSIS vs Pentaho
                            
                                How to measure duration of different tasks in a data flow task?
                            
                                The job failed. The job was invoked by user<user>. The last step to run was step1
                            
                                Pass Variables to Project Parameters in SSIS
                            
                                Configuring an MDX query on SSIS
                            
                                SSISDB not found and can not create New catalog in SQL Server 2012
                            
                                Filter out duplicates from a loaded dataset in SSIS
                            
                                How to avoid manually browsing DLL in Add Reference of Script Task when deploying package on production?
                            
                                SSIS Package failing with "Failed to acquire connection" error
                            
                                SQL Server stored procedure conversion to SSIS Package
                            
                                How can I use SSIS to add an extra column containing the value of an incremented variable into my destination table?
                            
                                Fact table partitioning: how to handle updates in ETL?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

SSIS Balanced Data Distributor - Increase number of operations?

Tags:

ssis

balanced-data-distributor

The Evil Greebo

People also ask

2 Answers

Samuel Vanga

Kyle Hale

Recent Activity

Donate For Us