Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SSIS Dataflow CSV to SQL Server with DateTime2 column

I'm trying to load a flat file into SQL Server via SSIS Data Flow task. As far as the file, I'm getting the column in this shape 20140311115000, If I turn Fast Parse: False I can get the column to import if I change the column to 2014-03-11 11:50:00. This isn't optimal though since I don't have control over the upstream files we're given, and I'd rather not parse each column/row/table. In my file connection, I have the column defined as: DT_DBTIMESTAMP2. In the condensed format, I get the following error:

[ADO NET Destination [2]] Error: System.ArgumentOutOfRangeException:
Year, Month, and Day parameters describe an un-representable DateTime.
at System.DateTime.DateToTicks(Int32 year, Int32 month, Int32 day)...`

Is there a way to make the shorter column format (20140311115000) import properly?

like image 818
Adam Wenger Avatar asked May 05 '15 20:05

Adam Wenger


1 Answers

To expand on my comments since this appears to be an acceptable answer:

I would handle this in SSIS with a Derived Column. It has a few major advantages: first, it uses the same parsing method as the rest of your import process so you don't have to be concerned about parsing fields; second, it's all done in-memory so you're not writing your data to disk twice; third, it's SSIS doing the transformation and not the SQL Server engine, so it won't suffer from resource contention (especially if your SSIS is on another server); four, Derived Columns use synchronous stream processing, which is about as fast as it gets.

The way I'd do this would be to define the field in the CSV's file as a string (DT_STR) of length 14. I tend to rename the CSV's input column "{SourceColumn}_STR" or "{SourceColumn}_RAW", because you need to have unique input and output column names and this lets me use "{SourceColumn}" later for the name of the Derived Column. That makes mapping the destination just a bit easier, IMO. If you weren't changing data type you could get away with replacing the column, but if you change data type you must also give it a new column name, AFAIK.

So, next you create your Flat File Data Source as normal in the Data Flow Task. Next, add in a Derived Column transformation. Edit the transform, give the new column a name of "{SourceColumn}", configure it to Add as a New Column, and format the string and type cast it with an expression like:

(DT_DBTIMESTAMP2, 2)(SUBSTRING(MyDateColumn,1,4) + "-" + SUBSTRING(MyDateColumn,5,2) + "-" + SUBSTRING(MyDateColumn,7,2) + " " + SUBSTRING(MyDateColumn,9,2) + ":" + SUBSTRING(MyDateColumn,11,2) + ":" + SUBSTRING(MyDateColumn,13,2))

I tend to use the formats from the TechNet wiki page for SSIS expressions and from the SSIS doc for Casting simply because the SSIS data types are different than SQL Server data types even though they map cleanly. DT_GUID for example, requires curly braces while UNIQUEIDENTIFIER doesn't.

This performs very well, in my experience. The only import I've got currently using this method right now is fairly small running on fairly moderate hardware. It's only importing about 12,000 records, but each record is about 4KB and has ~240 fields and it's transforming six or seven of them. Most of them are turning strings into DT_GUIDs by adding dashes and curly braces, but one of them is correcting malformed dates similar to this. The whole process including data write takes between 1 and 2 seconds.

like image 53
Bacon Bits Avatar answered Sep 20 '22 00:09

Bacon Bits