Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Executing SSIS package creates huge no. of temp files which makes me run out of disk space

I have a ssis package which I run using an sql job for bulk copy of data from one database to other. the destination is our integration server where we have enough space for database. But when i run this job (i.e package). it creates huge number of temp files in localsettings/temp folder in orders of for a 1GB mdf file it creates some 20gb of temp files. I have manually created this package and didnot use import export wizard. Can any one help me how to avoid this huge tempfiles while executing?.If any further details needed plese mention.

Note: many said if we create a package using import export wizard and set optimize for many tables true this will happen. But in this package i query only one table and have created manually without using import export wizard.

like image 825
venkatesh Avatar asked Dec 20 '22 17:12

venkatesh


2 Answers

Why is the package creating temp files?

SSIS is an in-memory ETL solution, except when it can't keep everything in memory and begins swapping to disk.

Why would restructuring the package as @jeff hornby suggested help?

Fully and partially blocking transformations force memory copies in your data flow. Assume you have 10 buckets carrying 1MB of data each. When you use a blocking transformation, as those buckets arrive at a transformation the data has to be copied from one memory location to another one. You've now doubled your packages total memory consumption as you have 10MB of data used before the union all transformation and then another 10MB after it.

Only use columns that you need. If a column is not in your destination, don't add it to the data flow. Use the database to perform sorts and merges. Cast your data to the appropriate types before it ever hits the data flow.

What else could be causing temp file usage

Lookup transformations. I've seen people crush their ETL server when they use SELECT * FROM dbo.BillionRowTable when all they needed was one or two columns for the current time period. The default behaviour of a lookup operation is to execute that source query and cache the results in memory. For large tables, wide and/or deep, this can make it look like your data flow isn't even running as SSIS is busy streaming and caching all of that data as part of the pre-execute phase.

Binary/LOB data. Have an (n)varchar(max)/varbinary(max) or classic BLOB data type in your source table? Sorry, that's not going to be in memory. Instead, the data flow is going to carry a pointer along and write a file out for each one of those objects.

Too much parallel processing. SSIS is awesome in that you get free paralleization of your proessing. Except you can have too much of a good thing. If you have 20 data flows all floating in space with no precedence between them, the Integration Services engine may try to run all of them at once. Add a precedence constraint between them, even if it's just on completion (on success/on fail) to force some serialization of operations. Inside a data flow, you can introduce the same challenge by having unrelated operations going on. My rule of thumb is that starting at any source or destination, I should be able to reach all the other source/destinations.

What else can I do?

Examine what else is using memory on the box. Have you set a sane (non-default) maximum memory value for SQL Server? SSIS like RAM like a fat kid loves cake so you need to balance the memory needs of SSIS against the database itself-they have completely separate memory spaces.

Each data flow has the ability to set the [BufferTempStoragePath and BlobTempStoragePath2. Take advantage of this and put that on a drive with sufficient storage

Finally, add more RAM. If you can't make the package better by doing the above, throw more hardware at it and be done.

like image 134
billinkc Avatar answered Dec 28 '22 09:12

billinkc


If you're getting that many temp files, then you probably have a lot of blocking transforms in your data flow. Try to eliminate the following types of transformations: Aggregate, Fuzzy Grouping, Fuzzy Lookup, Row Sampling, Sort, Term Extraction. Also, partially blocking transactions can create the same problems but not in the same scale: Data Mining Query, Merge, Merge Join, Pivot, Term Lookup, Union All, Unpivot. You might want to try to minimize these transformations.

Probably the problem is a sort transformation somewhere in your data flow (this is the most common). You might be able to eliminate this by using an ORDER BY clause in your SQL statement. Just remember to set the sorted property in the data source.

like image 24
Jeff Hornby Avatar answered Dec 28 '22 09:12

Jeff Hornby