Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data profiling Task - custom Profile Request

Is there any option to create a custom Profile Request for SSIS Data Profiling Task?

At the moment there are 5 standard profile requests under SSIS Data Profiling task:

  1. Column Null Ratio Profile Request
  2. Column Statistics Profile Request
  3. Column Length Distribution Profile Request
  4. Column Value Distribution Profile Request
  5. Candidate Key Profile Request

I need to add another one (Custom one) to get summary of all numeric values.

Thanks in advance for your helps.

like image 739
Barsham Avatar asked Aug 03 '18 05:08

Barsham


2 Answers

Based on this Microsoft Documentation, SSIS Data profiling Task has only 5 main profiles (listed on your question) and there is no option to add a custom profile.

For a similar reason, i will create an Execute SQL Task to achieve that, you can use the aggregate functions you need and ISNUMERIC function in the where clause :

SELECT MAX(CAST([Column] AS BIGINT)) -- Maximum value
       ,MIN(CAST([Column] AS BIGINT)) -- Minimum value
       ,COUNT(Column) -- Count values
       ,COUNT(DISTINCT [Column]) -- Count distinct values
       ,AVG(CAST([Column] AS BIGINT)) -- Average
       ,SUM(CAST([Column] AS BIGINT)) -- Sum
FROM TABLE
WHERE ISNUMERIC([Column]) = 1
like image 148
Hadi Avatar answered Oct 31 '22 16:10

Hadi


I think what you want to do here is create a computed column that is populated with your source column only if IsNumeric(SourceColumn) = 1.

Then create a profile task using Column Value Distribution Profile Request on the computed column, with ValueDistributionOption set to AllValues.

Edit:

To further clarify, the computed column doesn't have to be a task in SSIS, although that's how I was thinking about it when I came up with my answer. You could ALTER the table you want to profile, adding the computed column, and then create the Profile Task as I explained above.

I was also under the assumption that you wanted to profile the values of a single column. If you're wanting to do this for multiple columns, or need to profile the summary values aggregated from details records, then this answer may not be the best solution.

like image 23
digital.aaron Avatar answered Oct 31 '22 15:10

digital.aaron