Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SSIS Flat Files with Variable Column Numbers

Tags:

file

ssis

flat

SSIS does 2 things in relation to handling flat files which are particularly frustrating, and it seems there should be a way around them, but I can't figure it out. If you define a flat file with 10 columns, tab delimited with CRLF as the end of row marker this will work perfectly for files where there are exactly 10 columns in every row. The 2 painful scenarios are these:

  1. If someone supplies a file with an 11th column anywhere, it would be nice if SSIS simply ignored it, since you haven't defined it. It should just read the 10 columns you have defined then skip to the end of row marker, but what is does instead is concatenate any additional data with the data in the 10th column and bung all that into the 10th column. Kind of useless really. I realise this happens because the delimiter for the 10th column is not tab like all the others, but CRLF, so it just grabs everything up to the CRLF, replacing extra tabs with nothing as it does so. This is not smart, in my opinion.

  2. If someone supplies a file with only 9 columns something even worse happens. It will temporarily disregard the CRLF it has unexpectedly found and pad any missing columns with columns from the start of the next row! Not smart is an understatement here. Who would EVER want that to happen? The remainder of the file is garbage at that point.

It doesn't seem unreasonable to have variations in file width for whatever reason (of course only variations at the end of a row can reaonably be handled (x fewer or extra columns) but it looks like this is simply not handled well, unless I'm missing something.

So far our only solution to this is to load a row as one giant column (column0) and then use a script task to dynamically split it using however many delimiters it finds. This works well, except that it limits row widths to 4000 chars (the max width of one unicode column). If you need to import a wider row (say with multiple 4000 wide columns for text import) then you need to define multiple columns as above, but you are then stuck with requiring a strict number of columns per row.

Is there any way around these limitations?

like image 225
Glenn M Avatar asked Nov 25 '10 22:11

Glenn M


1 Answers

Glenn, i feel your pain :) SSIS cannot make the columns dynamic, as it needs to store metadata of each column as it come through, and since we're working with flat files which can contain any kind of data, it can't assume that the CRLF in a 'column-that-is-not-that-last-column', is indeed the end of the data line its supposed to read.

Unlike DTS in SQL2000, you can't change the properties of a SSIS package at runtime.

What you could do is create a parent package, that reads the flat file (script task), and only reads the first line of the flat file to get the number of columns, and the column names. This info can be stored in a variable.

Then, the parent package loads the child package (script task again) programmatically, and updates the metadata of the Source Connection of the child package. This is where you would either 1. Add / remove columns to match the flat file. 2. Set the column delimiter for the columns, the last column has to be the CRLF - matching the ROW delimiter 3. Reinitialise the metadata (ComponentMetadata.ReinitializeMetadata()) of the Source Compoenent in the Dataflow task (to recognize the recent changes in the Source Connection). 4. Save the child ssis package.

Details on programmatically modifying a package is readily available only.

Then, your parent package just executes the Child package (Execute Package Task), and it'll execute with your new mappings.

like image 185
guna Avatar answered Oct 17 '22 16:10

guna