I have an SSIS package--two data flow tasks, 8 components each, reading from two flat files, nothing spectacular. If I run it in BIDS, it takes reliably about 60 seconds. I have a sandbox DB server with the package running in a job which also takes reliably 30-60 seconds. On my production server, the same job with the same package takes anywhere from 30 seconds to 12 hours.
With logging enabled on the package, it looks like it bogs down--initially at least--in the pre-execute phase of one or the other (or both) data flow tasks. But I can also see the data coming in--slowly, in chunks, so I think it does move on from there later. The IO subsystem gets pounded, and SSIS generates many large temp files (about 150MB worth--my input data files are only about 24MB put together) and is reading and writing vigorously from those files (thrashing?).
Of note, if I point my BIDS instance of the package at the production server, it still only takes about 60 seconds to run! So it must be something with running dtexec there, not the DB itself.
I've already tried to optimize my package, reducing input row byte size, and I made the two data flow tasks run in series rather than in parallel--to no avail.
Both DB servers are running MSSQL 2008 R2 64-bit, same patch level. Both servers are VMs on the same host, with the same resource allocation. Load on the production server should not be that much higher than on the sandbox server right now. The only difference I can see is that the production server is running Windows Server 2008, while the sandbox is on Windows Server 2008 R2.
Help!!! Any ideas to try are welcome, what could be causing this huge discrepancy?
Here's what my package looks like…
The control flow is extremely simple:
The data flow looks like this:
The second data flow task is exactly the same, just with a different source file and destination table.
The completion constraint in the Control Flow is only there to make the tasks run serially to try and cut down on resources needed concurrently (not that it helped solve the problem)…there is no actual dependency between the two tasks.
I'm aware of potential issues with blocking and partially-blocking transforms (can't say I understand them completely, but somewhat at least) and I know the aggregate and merge join are blocking and could cause problems. However, again, this all runs fine and quickly in every other environment except the production server…so what gives?
The reason for the Merge Join is to make the task wait for both branches of the Multicast to complete. The right branch finds the minimum datetime in the input and deletes all records in the table after that date, while the left branch carries the new input records for insertion--so if the right branch proceeds before the aggregate and deletion, the new records will get deleted (this happened). I'm unaware of a better way to manage this.
The error output from "Delete records" is always empty--this is deliberate, as I don't actually want any rows from that branch in the merge (the merge is only there to synchronize completion as explained above).
See comment below about the warning icons.
At a high level, SSIS package inefficiencies could be explained by poorly designed source queries, inefficient transformations, slow destination flows or incorrect package configurations.
As you can see the 6 Data Flow Tasks are executing parallel. Once any of them will complete, the next will start.
If you have logging turned on, preferably to SQL Server, add the OnPipelineRowsSent event. You can then determine where it is spending all of its time. See this post Your IO subsystem getting slammed and generating all these temp files is because you are no longer able to keep all the information in memory (due to your async transformations).
The relevant query from the linked article is the following. It looks at events in the sysdtslog90
(SQL Server 2008+ users substitute sysssislog
) and performs some time analysis on them.
;
WITH PACKAGE_START AS
(
SELECT DISTINCT
Source
, ExecutionID
, Row_Number() Over (Order By StartTime) As RunNumber
FROM
dbo.sysdtslog90 AS L
WHERE
L.event = 'PackageStart'
)
, EVENTS AS
(
SELECT
SourceID
, ExecutionID
, StartTime
, EndTime
, Left(SubString(message, CharIndex(':', message, CharIndex(':', message, CharIndex(':', message, CharIndex(':', message, 56) + 1) + 1) + 1) + 2, Len(message)), CharIndex(':', SubString(message, CharIndex(':', message, CharIndex(':', message, CharIndex(':', message, CharIndex(':', message, 56) + 1) + 1) + 1) + 2, Len(message)) ) - 2) As DataFlowSource
, Cast(Right(message, CharIndex(':', Reverse(message)) - 2) As int) As RecordCount
FROM
dbo.sysdtslog90 AS L
WHERE
L.event = 'OnPipelineRowsSent'
)
, FANCY_EVENTS AS
(
SELECT
SourceID
, ExecutionID
, DataFlowSource
, Sum(RecordCount) RecordCount
, Min(StartTime) StartTime
, (
Cast(Sum(RecordCount) as real) /
Case
When DateDiff(ms, Min(StartTime), Max(EndTime)) = 0
Then 1
Else DateDiff(ms, Min(StartTime), Max(EndTime))
End
) * 1000 As RecordsPerSec
FROM
EVENTS DF_Events
GROUP BY
SourceID
, ExecutionID
, DataFlowSource
)
SELECT
'Run ' + Cast(RunNumber As varchar) As RunName
, S.Source
, DF.DataFlowSource
, DF.RecordCount
, DF.RecordsPerSec
, Min(S.StartTime) StartTime
, Max(S.EndTime) EndTime
, DateDiff(ms, Min(S.StartTime)
, Max(S.EndTime)) Duration
FROM
dbo.sysdtslog90 AS S
INNER JOIN
PACKAGE_START P
ON S.ExecutionID = P.ExecutionID
LEFT OUTER JOIN
FANCY_EVENTS DF
ON S.SourceID = DF.SourceID
AND S.ExecutionID = DF.ExecutionID
WHERE
S.message <> 'Validating'
GROUP BY
RunNumber
, S.Source
, DataFlowSource
, RecordCount
, DF.StartTime
, RecordsPerSec
, Case When S.Source = P.Source Then 1 Else 0 End
ORDER BY
RunNumber
, Case When S.Source = P.Source Then 1 Else 0 End Desc
, DF.StartTime , Min(S.StartTime);
You were able to use this query to discern that the Merge Join component was the lagging component. Why it performs differently between the two servers, I can't say at this point.
If you have the ability to create a table in your destination system, you could modify your process to have two 2 data flows (and eliminate the costly async components).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With