Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SSIS Package Full Table Load Slow

We have an SSIS package that is apparently termed as 'slow' by the development team. Since they do not have a person with SSIS ETL, as a DBA I tried digging into it. Below is the information I found: SQL Server was 2014 version upgraded -inplace to 2017 so it has SSIS of both versions.

  1. They load a SQL Server table of size 200 GB into SSIS and then zip the data into flatfile using command line zip functionality.
  2. The data flow task simple hits a select * from view - the view is nothing but containing the table with no other fancy joins.
  3. While troubleshooting I found that on SQL Server, there is hardly any load coming, possibly because the select command is running in single thread and not utilizing SQL server cores.
  4. When I run the same select * command (only for 5 seconds, since it is 200 GB table), even my command is single threaded.
  5. The package has a configuration file that the SQL job shows (this is how the package runs) with some connection settings.
  6. Opening the package in BIDS show defaultBufferMaxRows as 10000 only (possibly default value) (since configuration file or any variables does not has a customer value, I guess this is what the package is using too).

Both SQL and SSIS are on same server. SQL has been allocated max memory leaving around 100 GB for SSIS and OS.

Kindly share any ideas on how can I force the SQL Server to run this select command using multiple threads so that entire table gets inside SSIS buffer pool faster.

Edit: I am aware that bcp can read data faster than any process and save it to flatfile but at this point changes to the SSIS package has to be kept minimum and exploring options that can be incorporated within SSIS package.

Edit2: Parallelism works perfectly for my SQL Server as I verified for a lot of other queries.The table in question is 200 GB. It is something with SSIS only which is not hammering my DB as hard as it should.

Edit3: I have made some progress, adjusted the buffer value to 100 MB and max rows to 100000 and now the package seem to be doing better. when I run this package on the server directly using dtexec utility, it generates good load of 40- 50 MB per second but through SQL job it never generates lod more than 10 MB. so I am trying to figure out this behavior.

Edit4: I found that when I run the package directly from logging to the server and invoking dtexec utility, it runs good because it generates good load on the DB causing data I\O to remain steady between 30-50 MB\sec. The same thing from SQL job never exceeds the I\O more than 10 MB\sec.

I even tried to run the package using agent and opting for cmdline operation but no changes. Agent literally sucks here, any pointers on what could be wrong here?

Final Try: I am stumped at the observation I have finally: 1)Same package runs 3x faster when run from command prompt from windows node by invoking dtexc utility 2) Exact same package runs 3 times slower than above when involked by SQL agent which has sysadmin permissions on windows as well as SQL Server

In both cases, I tried to see the version of DTEXEC they invoke, and they both invoke the same version. So why one would be so slow is out of my understanding.

like image 967
ChandanJha Avatar asked Aug 08 '20 08:08

ChandanJha


2 Answers

I don't think that there is a general solution to this issue since it is a particular case that you didn't provide much information. Since there are two components in your data flow task (OLE DB Source and Flat File Destination), I will try to give some suggestions related to each component.

Before giving suggestions for each component, it is good to mention the following:

  1. If no transformations are applied within the data flow task, It is not recommended to use this task. It is preferable to use bcp utility
  2. Check the TempDb and the database log size.
  3. If a clustered index exists, try to rebuild it. If not, try to create a clustered index.
  4. To check the component that is slowing the package execution, open the package in Visual Studio and try to remove the flat file destination and replace it with a dummy Script Component (write any useless code, for example: string s = "";). And then run the package; if it is fast enough, then the problem is caused by the Flat File Destination, else you need to troubleshoot the OLE DB Source.
  5. Try executing the query in the SQL Server management studio and shows the execution plan.
  6. Check the package TargetServerVersion property within the package configuration and make sure it is correct.

OLE DB Source

As you mentioned, you are using a Select * from view query where data is stored in a table that contains a considerable amount of data. The SQL Server query optimizer may find that reading data using Table Scan is more efficient than reading from indexes, especially if your table does not have a clustered index (row store or column store).

There are many things you may try to improve data load:

  1. Try replacing the Select * from view with the original query used to create the view.
  2. Try changing the data provider used in the OLE DB Connection Manager: SQL Server Native Client, Microsoft OLE DB provider for SQL Server (not the old one).
  3. Try increasing the DefaultBufferMaxRows and DefaultBufferSize properties. more info
  4. Try replacing using SQL Command with specific column names instead of selecting the view name (Table of View data access mode). more info
  5. Try to load data in chunks

Flat File Destination

  1. Check that the flat file directory is not located on the same drive where SQL Server instance is installed
  2. Check that the flat file is not located on a busy drive
  3. Try to export data into multiple flat files instead of one huge file (split data into smaller files) , since when the exported data size increase in a single file, writing to this file become slower, then the package will become slower. (Check the 5th suggestion above)
like image 95
Hadi Avatar answered Oct 11 '22 03:10

Hadi


Any indexes on the table could slow loading. If there are any indexes, try dropping them before the load and then recreating them after. This would also update the index statistics, which would be skewed by the bulk insert.

like image 40
Joe Avatar answered Oct 11 '22 03:10

Joe