Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SSIS Data Flow How to Remove Duplicate Rows but Log the Duplicates in SSIS

I learned from Remove duplicates in SSIS Data Flow how to use the Sort transformation to remove rows with duplicate data values.

In my case, I'm reading a delimited file, need to eliminate the duplicates, and to log the rows which had the duplicate keys. I need to output those rows to another delimited file, and will email it back to the customer so they can correct the data and try again.

I can't quite figure out how to do this, though. I'll be experimenting with Aggregate and Merge Join, but I hope there's a known pattern for doing this.

like image 731
John Saunders Avatar asked Feb 20 '23 06:02

John Saunders


1 Answers

Hi my answer gonna work with any data, because some solutions in internet need primary key of rows, for my solution primary key is not required. Here sample structure and sample dataset:

a   b
1   23
1   23
16  59
12  12
13  45
12  12
45  56

enter image description here

Just group by all columns and add last column - count all (If there are more than two columns or more, you just need in "Aggregate" element put all columns and foreach set group by and in the end put "Count All" column):

enter image description here

Then just add conditional split element and take all rows where are more than 1 same row:

enter image description here

Real Example:

enter image description here

like image 164
Justin Avatar answered Apr 26 '23 14:04

Justin