In our system we run hourly imports from an external database. Due to an error in the import scripts, there are now some duplicate records. A duplicate is deemed where any record has the same <code>:legacy_id</code> and <code>:company</code>. What code can I run to find and delete these duplicates? I was playing around with this: <pre class="prettyprint"><code>Product.select(:legacy_id,:company).group(:legacy_id,:company).having("count(*) > 1") </code></pre> It seemed to return some of the duplicates, but I wasn't sure how to delete from there? Any ideas?

You can try the following approach: <pre class="prettyprint"><code>Product.where.not( id: Product.group(:legacy_id, :company).pluck('min(products.id)') ).delete_all </code></pre> Or pure sql: <pre class="prettyprint"><code>delete from products where id not in ( select min(p.id) from products p group by p.legacy_id, p.company ) </code></pre>

Delete duplicate records based on multiple columns

Tags:

sql

duplicates

ruby-on-rails

activerecord

rails-activerecord

In our system we run hourly imports from an external database. Due to an error in the import scripts, there are now some duplicate records.

A duplicate is deemed where any record has the same :legacy_id and :company.

What code can I run to find and delete these duplicates?

I was playing around with this:

Click to copy

Product.select(:legacy_id,:company).group(:legacy_id,:company).having("count(*) > 1")

It seemed to return some of the duplicates, but I wasn't sure how to delete from there?

Any ideas?

550

asked Nov 28 '14 13:11

bnussey

1 Answers

You can try the following approach:

Click to copy

Product.where.not(
  id: Product.group(:legacy_id, :company).pluck('min(products.id)')
).delete_all

Or pure sql:

Click to copy

delete from products
where id not in ( 
   select min(p.id) from products p group by p.legacy_id, p.company
)

answered Oct 13 '22 12:10

potashin

Related questions
                            
                                Maintain SQL operator precedence when constructing Q objects in Django
                            
                                Refactoring SQL
                            
                                What is the best way to handle bc dates in .net / sql server?
                            
                                Mysql Explain Query with type "ALL" when an index is used
                            
                                SQL efficient schedule generation algorithm
                            
                                Mysql: 7 billions records in a table
                            
                                Multiple foreign keys?
                            
                                Useful stock SQL datasets?
                            
                                Update all models at once in Django
                            
                                How to find intersecting geographies between two tables recursively
                            
                                Invalid digits on Redshift
                            
                                Any benefit to explicitly dropping local temporary tables at the end of a stored procedure?
                            
                                SQL stored procedure IF EXISTS UPDATE ELSE INSERT
                            
                                Creating User Defined Function in Spark-SQL
                            
                                Postgres GROUP BY on jsonb inner field
                            
                                Grouped string aggregation / LISTAGG for SQL Server
                            
                                Foreign key referencing a view in Oracle
                            
                                Root cause of an "Invalid object name: dbo.etc" error?
                            
                                PL/SQL Performance Tuning for LIKE '%...%' Wildcard Queries
                            
                                Connecting to external database - Android application

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Delete duplicate records based on multiple columns

Tags:

sql

duplicates

ruby-on-rails

activerecord

rails-activerecord

bnussey

People also ask

1 Answers

potashin

Recent Activity

Donate For Us