How to remove duplicate rows from flat file using SSIS?

Tags:

Let me first say that being able to take 17 million records from a flat file, pushing to a DB on a remote box and having it take 7 minutes is amazing. SSIS truly is fantastic. But now that I have that data up there, how do I remove duplicates?

Better yet, I want to take the flat file, remove the duplicates from the flat file and put them back into another flat file.

I am thinking about a:

Data Flow Task

File source (with an associated file connection)
A for loop container
A script container that contains some logic to tell if another row exists

Thak you, and everyone on this site is incredibly knowledgeable.

Update: I have found this link, might help in answering this question

354

asked Sep 29 '08 21:09

RyanKeeter

2 Answers

Use the Sort Component.

Simply choose which fields you wish to sort your loaded rows by and in the bottom left corner you'll see a check box to remove duplicates. This box removes any rows which are duplicates based on the sort criteria only so in the example below the rows would be considered duplicate if we only sorted on the first field:

1 | sample A |
1 | sample B |

answered Oct 09 '22 18:10

Craig Warren

I would suggest using SSIS to copy the records to a temporary table, then create a task that uses Select Distinct or Rank depending on your situation to select the duplicates which would funnel them to a flat file and delete them from the temporary table. The last step would be to copy the records from the temporary table into the destination table.

Determining a duplicate is something SQL is good at but a flat file is not as well suited for. In the case you proposed, the script container would load a row and then would have to compare it against 17 million records, then load the next row and repeat...The performance might not be all that great.

answered Oct 09 '22 18:10

Timothy Lee Russell

Related questions
                            
                                Is it possible to use MAX in update statement using sql?
                            
                                Is there a way I can retrieve sa password in sql server 2005 [closed]
                            
                                SQL Server DELETE is slower with indexes
                            
                                image data type SQL Server 2008 C# data type
                            
                                Dividing 2 numbers in Sql Server
                            
                                Convert a 12 hour format to 24 hour format in sql server
                            
                                Datagridview how to cast selected row to custom object
                            
                                How to get number of days in a month in SQL Server
                            
                                Which of two ways of coding an Inner join is faster?
                            
                                SQLServer - Select bool if column begins with a string
                            
                                Is SQL Server/Windows integrated security good for anything?
                            
                                Why is it so difficult to do a loop in T-SQL
                            
                                SQL use comma-separated values with IN clause
                            
                                Splitting Date into 2 Columns (Date + Time) in SQL
                            
                                SQL MERGE with variables
                            
                                Get row count of all tables in database: SQL Server [duplicate]
                            
                                SQL Server - select substring of all characters following last hyphen
                            
                                Why is LAST_VALUE() not working in SQL Server?
                            
                                What kinds of problems are most likely to occur?
                            
                                How to insert xml into a node in another xml using XQuery?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to remove duplicate rows from flat file using SSIS?

Tags:

sql-server

duplicates

duplicate-removal

ssis

business-intelligence

RyanKeeter

People also ask

2 Answers

Craig Warren

Timothy Lee Russell

Recent Activity

Donate For Us