When an excel data source is used in SSIS, the data types of each individual column are derived from the data in the columns. Is it possible to override this behaviour?
Ideally we would like every column delivered from the excel source to be string data type, so that data validation can be performed on the data received from the source in a later step in the data flow.
Currently, the Error Output tab can be used to ignore conversion failures - the data in question is then null, and the package will continue to execute. However, we want to know what the original data was so that an appropriate error message can be generated for that row.
To convert data to a different data typeClick the Data Flow tab, and then, from the Toolbox, drag the Data Conversion transformation to the design surface.
The way Excel import works make it not possible to automate it. You have to modify most of the excel files manually in order to load them. This is why we are no longer using ODBC/OleDB/Ms Jet for Excel connections. Our ETL solutions work correctly with Excel all the time.
For Excel it will append data. There is no such option available for overwriting data. You have to delete and recreate the file through the File System task.
Conditional Split: This transformation is used to accepts input and determine which destination to pipe the data into based on the result of an expression. It redirects rows of data that meet specific conditions to different outputs.
According to this blog post, the problem is that the SSIS Excel driver determines the data type for each column based on reading values of the first 8 rows:
- If the top 8 records contain equal number of numeric and character types – then the priority is numeric
- If the majority of top 8 records are numeric then it assigns the data type as numeric and all character values are read as NULLs
- If the majority of top 8 records are of character type then it assigns the data type as string and all numeric values are read as NULLs
The post outlines two things you can do to fix this:
IMEX=1
to the end of your Excel driver connection string. This will allow Excel to read the values as Unicode. However, this is not sufficient if the data in the first 8 rows are numeric.HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Nod\Microsoft\Jet\4.0\Engines\Excel\TypeGuessRows
to 0. This will ensure that the driver looks at all the rows to determine the data type for the column.Yes, you can. Just go into the output column list on the Excel source and set the type for each of the columns.
To get to the input columns list right click on the Excel source, select 'Show Advanced Editor', click the tab labeled 'Input and Output Properties'.
A potentially better solution is to use the derived column component where you can actually build "new" columns for each column in Excel. This has the benefits of
If your Excel file contains a number in the column in question in the first row of data, it seems that the SSIS engine will reset the type to a numeric type. It kept resetting mine. I went into my Excel file and changed the numbers to "Numbers stored as text" by placing a single quote in front of them. They are now read as text.
I also noticed that SSIS uses the first row to IGNORE what the programmer has indicated is the actual type of the data (I even told Excel to format the entire column as TEXT, but SSIS still used the data, which was a bunch of digits), and reset it. Once I fixed that by putting a single-quote in my Excel file in front of the number in the first row of data, I thought it would get it right, but no, there is additional work.
In fact, even though the SSIS External DataSource Column now has the type DT_WSTR, it will still read 43567192 as 4.35671E+007. So you have to go back into your Excel file and put single quotes in front of all the numbers.
Pretty LAME, Microsoft! But there's your solution. I have no idea what to do if the Excel file is not under your control.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With