tl;dr version
I get errors when using OLE DB (SNC10.0) connection managers after a few nights of running, could connections not be properly timing out? Switching to ADO.NET Connection Managers and sources seems to fix it, why?
I apologize for the generic title but there are too many details to state in a single line.
Technology:
In all cases the database server, both source and destination are SQL Server 2008 R2
The setup:
I have a set of SSIS packages that run one after another in the middle of the night. There are 7 of them currently. They all perform a similar set of tasks: they first connect to a source database and copy the data to a staging database. Then they do various transformations within the staging database. Finally the process connects to a target database and fills it with data.
I set up all the connections as OLE DB connections (SQL Native Client 10.0) so that I can use them with Lookup components and other OLE-specific components.
The problem:
We have been experiencing issues repeatedly with our automated runs of the SSIS package. Generally I will test it running manually from my station, it will run fine; then we'll save the SSIS package into the SQL Server and schedule it and it will run fine. A few nights later, we will get an issue such as:
SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80004005. An OLE DB record is available. Source: "Microsoft OLE DB Provider for SQL Server" Hresult: 0x80004005 Description: "Protocol error in TDS stream".
or
SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80004005. An OLE DB record is available. Source: "Microsoft SQL Server Native Client 10.0" Hresult: 0x80004005 Description: "Unknown token received from SQL Server".
When searched for online, both of these point toward connection issues and specifically network connectivity problems.
Work-around:
I found that a quick, if not always simple, solution to these issues is to replace the source nodes with ADO NET Sources rather than OLE DB Sources. This is acceptable within my Data Flow tasks for some cases, but in the cases where I need to use a Lookup component, or some other such tool that only works with the OLE sources, this is not a good enough solution if I will still encounter these issues.
Question:
I know there are tons of differences between ADO.NET and OLE DB connections, but one primary thing I noticed is that the OLE DB connection manager has two timeouts, both defaulted to the value of '0' which generally means disabled (no timeout). The ADO.NET connection manager has a single timeout and it's set to the value of '15' (15 seconds).
How do these two connection managers handle the time-outs and closing connections? With a value of 0 in an OLE DB connection manager timeout, will that connection never be closed unless something is done on the SQL Server? Could this be part of my issue, with so many data-flow tasks opening OLE DB connections and then not being closed? Is there anything I can do in an SSIS package to forcefully close these connections?
****EDIT****
Here is a screenshot of the data flow task in question. I have altered some of the names to protect the innocent, etc.
The task as pictured here will run completely fine and works 100% of the time. If I change that ADO.NET Source to an OLE DB Source I get the errors mentioned in the post. I have, in some other cases, went through and eliminated Lookups by expanding the source query. In this task I have not.
We found out what the source of all our issues were and the reason the description of problems didn't match the description of the environment and there were not enough clues to solve.
Finally everything blew up and we found out that "nothing has changed on the network or servers" was not the case.
There was a backup occurring in the middle of our jobs. That backup was using volume shadow copy and was backing up both the production databases as well as tempdb. Because of disk IO issues/locks, the tempdb had grown to half a terrabyte due to unwritten changes and then that was further attempting to be shadow-copied.
Turning off the backup/shadow copy on tempdb and the production database caused the jobs to go through immediately. Queries that were taking > 30 minutes are now < 1 minute.
Thank you guys for sticking with me and thinking through it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With