Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SSIS Excel Data Source - Is it possible to override column data types?

Tags:

excel

ssis

When an excel data source is used in SSIS, the data types of each individual column are derived from the data in the columns. Is it possible to override this behaviour?

Ideally we would like every column delivered from the excel source to be string data type, so that data validation can be performed on the data received from the source in a later step in the data flow.

Currently, the Error Output tab can be used to ignore conversion failures - the data in question is then null, and the package will continue to execute. However, we want to know what the original data was so that an appropriate error message can be generated for that row.

like image 339
Hugh Mullally Avatar asked Feb 12 '09 10:02

Hugh Mullally


People also ask

How do I change datatype of a column in SSIS?

To convert data to a different data typeClick the Data Flow tab, and then, from the Toolbox, drag the Data Conversion transformation to the design surface.

What are the problems with Excel source in SSIS?

The way Excel import works make it not possible to automate it. You have to modify most of the excel files manually in order to load them. This is why we are no longer using ODBC/OleDB/Ms Jet for Excel connections. Our ETL solutions work correctly with Excel all the time.

How do I overwrite data in Excel using SSIS?

For Excel it will append data. There is no such option available for overwriting data. You have to delete and recreate the file through the File System task.

Which transformation would you use to change the data type of the input column to match your destination column?

Conditional Split: This transformation is used to accepts input and determine which destination to pipe the data into based on the result of an expression. It redirects rows of data that meet specific conditions to different outputs.


3 Answers

According to this blog post, the problem is that the SSIS Excel driver determines the data type for each column based on reading values of the first 8 rows:

  • If the top 8 records contain equal number of numeric and character types – then the priority is numeric
  • If the majority of top 8 records are numeric then it assigns the data type as numeric and all character values are read as NULLs
  • If the majority of top 8 records are of character type then it assigns the data type as string and all numeric values are read as NULLs

The post outlines two things you can do to fix this:

  1. First, add IMEX=1 to the end of your Excel driver connection string. This will allow Excel to read the values as Unicode. However, this is not sufficient if the data in the first 8 rows are numeric.
  2. In the registry, change the value for HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Nod\Microsoft\Jet\4.0\Engines\Excel\TypeGuessRows to 0. This will ensure that the driver looks at all the rows to determine the data type for the column.
like image 171
Ryan Kohn Avatar answered Oct 03 '22 21:10

Ryan Kohn


Yes, you can. Just go into the output column list on the Excel source and set the type for each of the columns.

To get to the input columns list right click on the Excel source, select 'Show Advanced Editor', click the tab labeled 'Input and Output Properties'.

A potentially better solution is to use the derived column component where you can actually build "new" columns for each column in Excel. This has the benefits of

  1. You have more control over what you convert to.
  2. You can put in rules that control the change (i.e. if null give me an empty string, but if there is data then give me the data as a string)
  3. Your data source is not tied directly to the rest of the process (i.e. you can change the source and the only place you will need to do work is in the derived column)
like image 22
Robert MacLean Avatar answered Oct 03 '22 21:10

Robert MacLean


If your Excel file contains a number in the column in question in the first row of data, it seems that the SSIS engine will reset the type to a numeric type. It kept resetting mine. I went into my Excel file and changed the numbers to "Numbers stored as text" by placing a single quote in front of them. They are now read as text.

I also noticed that SSIS uses the first row to IGNORE what the programmer has indicated is the actual type of the data (I even told Excel to format the entire column as TEXT, but SSIS still used the data, which was a bunch of digits), and reset it. Once I fixed that by putting a single-quote in my Excel file in front of the number in the first row of data, I thought it would get it right, but no, there is additional work.

In fact, even though the SSIS External DataSource Column now has the type DT_WSTR, it will still read 43567192 as 4.35671E+007. So you have to go back into your Excel file and put single quotes in front of all the numbers.

Pretty LAME, Microsoft! But there's your solution. I have no idea what to do if the Excel file is not under your control.

like image 44
Dave Scotese Avatar answered Oct 03 '22 23:10

Dave Scotese