SSIS Data Flow How to Remove Duplicate Rows but Log the Duplicates in SSIS

Question

I learned from Remove duplicates in SSIS Data Flow how to use the Sort transformation to remove rows with duplicate data values.

In my case, I'm reading a delimited file, need to eliminate the duplicates, and to log the rows which had the duplicate keys. I need to output those rows to another delimited file, and will email it back to the customer so they can correct the data and try again.

I can't quite figure out how to do this, though. I'll be experimenting with Aggregate and Merge Join, but I hope there's a known pattern for doing this.

Justin · Accepted Answer

Hi my answer gonna work with any data, because some solutions in internet need primary key of rows, for my solution primary key is not required. Here sample structure and sample dataset:

enter image description here

Just group by all columns and add last column - count all (If there are more than two columns or more, you just need in "Aggregate" element put all columns and foreach set group by and in the end put "Count All" column):

enter image description here

Then just add conditional split element and take all rows where are more than 1 same row:

enter image description here

Real Example:

enter image description here

SSIS Data Flow How to Remove Duplicate Rows but Log the Duplicates in SSIS

Tags:

duplicates

ssis

John Saunders

1 Answers

Justin

Recent Activity

Donate For Us

SSIS Data Flow How to Remove Duplicate Rows but Log the Duplicates in SSIS

Tags:

duplicates

ssis

John Saunders

1 Answers

Justin

Related questions

Recent Activity

Donate For Us