I am working on an SSIS data flow task.
The source table is from old database which is denormalized.
The destination table is normalized.
SSIS fails because the data transfer is not possible because of duplicates (duplicates in primary key column).
It would be good if the SSIS can checks the destination for availability of current record (by checking the key) and if it exists , it can ignore pushing it. Then it can continue with the next record.
Is there a way to handle this scenario?
Step 1 – Identify Duplicates First we copy the value from Col A into Duplicated using “Equals” operator. Then use “Remove Uncommon” between Duplicated and Col B. This will remove the items from Duplicated that don't show up in Col B.
The Sort transformation can also remove duplicate rows as part of its sort.
To configure the Remove Duplicates transformation go to Transformation / MAPPING / Complex Transformations / Remove Duplicates .
SELECT supplier_id FROM suppliers UNION ALL SELECT supplier_id FROM orders ORDER BY supplier_id; This SQL UNION ALL example would return the supplier_id multiple times in the result set if that same value appeared in both the suppliers and orders table. The SQL UNION ALL operator does not remove duplicates.
Assuming your destination table is a subset of your source table, you should be able to use the Sort Transformation to pull in only the columns you need for your destination table, and then check the "Remove rows with duplicate sort values" to basically give you a distinct list of records based on the columns you selected.
Then, simply route the results of the sort to your destination, and you should be good to go.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With