I learned from Remove duplicates in SSIS Data Flow how to use the Sort transformation to remove rows with duplicate data values.
In my case, I'm reading a delimited file, need to eliminate the duplicates, and to log the rows which had the duplicate keys. I need to output those rows to another delimited file, and will email it back to the customer so they can correct the data and try again.
I can't quite figure out how to do this, though. I'll be experimenting with Aggregate and Merge Join, but I hope there's a known pattern for doing this.
Hi my answer gonna work with any data, because some solutions in internet need primary key of rows, for my solution primary key is not required. Here sample structure and sample dataset:
a b
1 23
1 23
16 59
12 12
13 45
12 12
45 56
Just group by all columns and add last column - count all (If there are more than two columns or more, you just need in "Aggregate" element put all columns and foreach set group by and in the end put "Count All" column):
Then just add conditional split element and take all rows where are more than 1 same row:
Real Example:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With