I am working on a project to add logging to our SSIS packages. I am doing my own custom logging by implementing some of the event handlers. I have implemented the OnInformation event to write the time, source name, and message to the log file. When data is moved from one table to another, the OnInformation event will give me a message such as: <blockquote> component "TABLENAME" (1)" wrote 87 rows. </blockquote> In the event that one of the rows fails, and lets say only 85 rows were processed out of the expected 87. I would assume that the above line would read <code>wrote 85 rows</code>. How do I track how many rows SHOULD HAVE processed in this case? I would like to see something like <code>wrote 85 of 87 rows</code>. Basically, I think I need to know how to get the number of rows returned from the Source's query. Is there an easy way to do this? Thank you

You can use the <code>Row Count transaformation</code> after the Data source and save it the variable. This is going to be number of rows to be processed. Once it got loaded into the Destination, you should use the <code>Execute SQL Task</code> in <code>Control flow</code> and use <code>Select Count(*) from <<DestinationTable>></code> and save the count into the Other variable[You should use the Where clause in your query to identify the current load]. So you will have number rows processed for logging. Hope this helps!

Not enough space in comments to provide feedback. Posting an incomplete answer as I need to leave for the day. You are going to have trouble accomplishing what you are asking for. Based on your comments in Gowdhaman008's answer, the value of a variable is not visible outside of a Data flow until after the finalizer event fires (OnPostExecute, I think). You can cheat and get that data out by making use of a script task to count rows through and firing off events, custom or predefined, to reporting package progress. In fact, just capture the <code>OnPipelineRowsSent</code> event. That will record how many rows are passing through a particular juncture and time surrounding it. SSIS Performance Framework Plus, you don't have to do any custom work or maintenance on your stuff. Out of the box functionality is a definite win. That said, you aren't really going to know how many rows are coming out of a source until it's finished. That sounds stupid and I completely agree but it's the truth. Imagine a simple case, an OLE DB Source that is going to send 1,000,000 rows straight into an OLE DB Destination. Most likely, not all 1M rows are going to start in the pipeline, maybe only 10k will be in the first buffer. Those buffers are pushed to the destination and now you know 10k rows out of 10k rows have been processed. Lather, rinse, repeat a few times and in this buffer, a row has a NULL where it shouldn't. Boom goes the dynamite and the process fails. We have had 60k rows flow into the pipeline and that's all we know about because of the failure. The only way to ensure we have accounted for all the source rows is to put an asynchronous transformation into the the pipeline to block all downstream components until all the data has arrived. This will obliterate any chance you have of getting good performance out of your packages. You'd still be subject to the aforementioned restrictions on updating variables but your FireXEvent message would accurately describe how many rows could have been processed in the queue. If you started an explicit transaction, you could do something ugly like an Execute SQL Task just to get the expected count, write that to a variable and then log rows processed but then you're double querying your data and you increase the likelyhood of blocking on the source system because of the double pump. And that's only going to work for something database like. The same concept would apply for a flat file except now you'd need a script task to read all the rows first. Where this gets uglier is for a slow starting data source, like a web service. The default buffer size might cause the entire package to run much longer than it'd need to simple because we are waiting on the data to arrive Slow starts <h3>What I'd do</h3> I'd record my starting and error counts (and more) using the Row Count. This will help you account for all the data that came in and where it went. I'd then turn on the <code>OnPipelineRowsSent</code> event to allow me to query the log and see how many rows are flowing through it RIGHT NOW. <img src="https://i.stack.imgur.com/wtpR0.png" alt="enter image description here">

In SSIS, how do I get the number of rows returned from the Source that SHOULD be processed

Tags:

sql-server

event-handling

events

rows

ssis

I am working on a project to add logging to our SSIS packages. I am doing my own custom logging by implementing some of the event handlers. I have implemented the OnInformation event to write the time, source name, and message to the log file. When data is moved from one table to another, the OnInformation event will give me a message such as:

component "TABLENAME" (1)" wrote 87 rows.

In the event that one of the rows fails, and lets say only 85 rows were processed out of the expected 87. I would assume that the above line would read wrote 85 rows. How do I track how many rows SHOULD HAVE processed in this case? I would like to see something like wrote 85 of 87 rows. Basically, I think I need to know how to get the number of rows returned from the Source's query. Is there an easy way to do this?

Thank you

629

asked Jan 29 '13 20:01

nleidwinger18

2 Answers

You can use the Row Count transaformation after the Data source and save it the variable. This is going to be number of rows to be processed. Once it got loaded into the Destination, you should use the Execute SQL Task in Control flow and use Select Count(*) from <<DestinationTable>> and save the count into the Other variable[You should use the Where clause in your query to identify the current load]. So you will have number rows processed for logging.

Hope this helps!

199

answered Sep 18 '22 10:09

Gowdhaman008

Not enough space in comments to provide feedback. Posting an incomplete answer as I need to leave for the day.

You are going to have trouble accomplishing what you are asking for. Based on your comments in Gowdhaman008's answer, the value of a variable is not visible outside of a Data flow until after the finalizer event fires (OnPostExecute, I think). You can cheat and get that data out by making use of a script task to count rows through and firing off events, custom or predefined, to reporting package progress. In fact, just capture the OnPipelineRowsSent event. That will record how many rows are passing through a particular juncture and time surrounding it. SSIS Performance Framework Plus, you don't have to do any custom work or maintenance on your stuff. Out of the box functionality is a definite win.

That said, you aren't really going to know how many rows are coming out of a source until it's finished. That sounds stupid and I completely agree but it's the truth. Imagine a simple case, an OLE DB Source that is going to send 1,000,000 rows straight into an OLE DB Destination. Most likely, not all 1M rows are going to start in the pipeline, maybe only 10k will be in the first buffer. Those buffers are pushed to the destination and now you know 10k rows out of 10k rows have been processed. Lather, rinse, repeat a few times and in this buffer, a row has a NULL where it shouldn't. Boom goes the dynamite and the process fails. We have had 60k rows flow into the pipeline and that's all we know about because of the failure.

The only way to ensure we have accounted for all the source rows is to put an asynchronous transformation into the the pipeline to block all downstream components until all the data has arrived. This will obliterate any chance you have of getting good performance out of your packages. You'd still be subject to the aforementioned restrictions on updating variables but your FireXEvent message would accurately describe how many rows could have been processed in the queue.

If you started an explicit transaction, you could do something ugly like an Execute SQL Task just to get the expected count, write that to a variable and then log rows processed but then you're double querying your data and you increase the likelyhood of blocking on the source system because of the double pump. And that's only going to work for something database like. The same concept would apply for a flat file except now you'd need a script task to read all the rows first.

Where this gets uglier is for a slow starting data source, like a web service. The default buffer size might cause the entire package to run much longer than it'd need to simple because we are waiting on the data to arrive Slow starts

What I'd do

I'd record my starting and error counts (and more) using the Row Count. This will help you account for all the data that came in and where it went. I'd then turn on the OnPipelineRowsSent event to allow me to query the log and see how many rows are flowing through it RIGHT NOW.

enter image description here

answered Sep 21 '22 10:09

billinkc

Related questions
                            
                                System.Security.HostProtectionException when Executing User Defined Function on SQL Server
                            
                                What would cause timeouts on a single table
                            
                                Conditional sum in SQL Server
                            
                                Custom Fields in .Net and SQL Server
                            
                                How can an index slow down a select statement?
                            
                                UNION with dissimilar columns
                            
                                Can you change the column length in a view in SQL Server 2000?
                            
                                When multiple calls to the same UDF are in a single statement, how many times will it be called?
                            
                                Select DATEADD minutes with query SQL Server 2008
                            
                                UCS-2 and SQL Server
                            
                                Change the Application Name of SSMS
                            
                                SQL datetime needs to read 00:00:00.000
                            
                                Is it possible to connect to SqlServer (MSSQL) with Haskell and Linux?
                            
                                Inserting multiple select statements into a table as values
                            
                                Transaction (Process ID) was deadlocked on lock | communication buffer resources with another process and has been chosen as the deadlock victim
                            
                                How do I convert an XML field with more than 8000 characters into a string?
                            
                                Suggestions needed on an architecture for a multiple clients and customisable web application [closed]
                            
                                Avoid calling a scalar function multiple times in an UPDATE statement
                            
                                How to get an Identity column value (not committed yet) inside a transaction
                            
                                How to alter SQL Table default data type during design

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With