I'm using Heroku to host my Ruby on Rails application and for one reason or another, I may have some duplicate rows. Is there a way to delete duplicate records based on 2 or more criteria but keep just 1 record of that duplicate collection? In my use case, I have a Make and Model relationship for cars in my database. <pre class="prettyprint"><code>Make Model --- --- Name Name Year Trim MakeId </code></pre> I'd like to delete all Model records that have the same Name, Year and Trim but keep 1 of those records (meaning, I need the record but only once). I'm using Heroku console so I can run some active record queries easily. Any suggestions?

If your User table data like below <pre class="prettyprint"><code>User.all => [ #<User id: 15, name: "a", email: "a@gmail.com", created_at: "2013-08-06 08:57:09", updated_at: "2013-08-06 08:57:09">, #<User id: 16, name: "a1", email: "a@gmail.com", created_at: "2013-08-06 08:57:20", updated_at: "2013-08-06 08:57:20">, #<User id: 17, name: "b", email: "b@gmail.com", created_at: "2013-08-06 08:57:28", updated_at: "2013-08-06 08:57:28">, #<User id: 18, name: "b1", email: "b1@gmail.com", created_at: "2013-08-06 08:57:35", updated_at: "2013-08-06 08:57:35">, #<User id: 19, name: "b11", email: "b1@gmail.com", created_at: "2013-08-06 09:01:30", updated_at: "2013-08-06 09:01:30">, #<User id: 20, name: "b11", email: "b1@gmail.com", created_at: "2013-08-06 09:07:58", updated_at: "2013-08-06 09:07:58">] 1.9.2p290 :099 > </code></pre> Email id's are duplicate, so our aim is to remove all duplicate email ids from user table. Step 1: To get all distinct email records id. <pre class="prettyprint"><code>ids = User.select("MIN(id) as id").group(:email,:name).collect(&:id) => [15, 16, 18, 19, 17] </code></pre> Step 2: To remove duplicate id's from user table with distinct email records id. Now the ids array holds the following ids. <pre class="prettyprint"><code>[15, 16, 18, 19, 17] User.where("id NOT IN (?)",ids) # To get all duplicate records User.where("id NOT IN (?)",ids).destroy_all </code></pre> ** RAILS 4 ** ActiveRecord 4 introduces the <code>.not</code> method which allows you to write the following in Step 2: <pre class="prettyprint"><code>User.where.not(id: ids).destroy_all </code></pre>

Remove duplicate records based on multiple columns?

Tags:

duplicates

ruby-on-rails-3

activerecord

destroy

I'm using Heroku to host my Ruby on Rails application and for one reason or another, I may have some duplicate rows.

Is there a way to delete duplicate records based on 2 or more criteria but keep just 1 record of that duplicate collection?

In my use case, I have a Make and Model relationship for cars in my database.

Make      Model ---       --- Name      Name           Year           Trim           MakeId

I'd like to delete all Model records that have the same Name, Year and Trim but keep 1 of those records (meaning, I need the record but only once). I'm using Heroku console so I can run some active record queries easily.

Any suggestions?

958

asked Jan 02 '13 15:01

sergserg

2 Answers

class Model    def self.dedupe     # find all models and group them on keys which should be common     grouped = all.group_by{|model| [model.name,model.year,model.trim,model.make_id] }     grouped.values.each do |duplicates|       # the first one we want to keep right?       first_one = duplicates.shift # or pop for last one       # if there are any more left, they are duplicates       # so delete all of them       duplicates.each{|double| double.destroy} # duplicates can now be destroyed     end   end  end  Model.dedupe

Find All
Group them on keys which you need for uniqueness
Loop on the grouped model's values of the hash
remove the first value because you want to retain one copy
delete the rest

answered Sep 20 '22 04:09

Aditya Sanghi

If your User table data like below

User.all => [     #<User id: 15, name: "a", email: "[email protected]", created_at: "2013-08-06 08:57:09", updated_at: "2013-08-06 08:57:09">,      #<User id: 16, name: "a1", email: "[email protected]", created_at: "2013-08-06 08:57:20", updated_at: "2013-08-06 08:57:20">,      #<User id: 17, name: "b", email: "[email protected]", created_at: "2013-08-06 08:57:28", updated_at: "2013-08-06 08:57:28">,      #<User id: 18, name: "b1", email: "[email protected]", created_at: "2013-08-06 08:57:35", updated_at: "2013-08-06 08:57:35">,      #<User id: 19, name: "b11", email: "[email protected]", created_at: "2013-08-06 09:01:30", updated_at: "2013-08-06 09:01:30">,      #<User id: 20, name: "b11", email: "[email protected]", created_at: "2013-08-06 09:07:58", updated_at: "2013-08-06 09:07:58">]  1.9.2p290 :099 >

Email id's are duplicate, so our aim is to remove all duplicate email ids from user table.

Step 1:

To get all distinct email records id.

ids = User.select("MIN(id) as id").group(:email,:name).collect(&:id) => [15, 16, 18, 19, 17]

Step 2:

To remove duplicate id's from user table with distinct email records id.

Now the ids array holds the following ids.

[15, 16, 18, 19, 17] User.where("id NOT IN (?)",ids)  # To get all duplicate records User.where("id NOT IN (?)",ids).destroy_all

** RAILS 4 **

ActiveRecord 4 introduces the .not method which allows you to write the following in Step 2:

User.where.not(id: ids).destroy_all

answered Sep 20 '22 04:09

Aravind encore

Related questions
                            
                                ActiveRecord Find By Year, Day or Month on a Date field
                            
                                Generate migration - create join table
                            
                                In Rails, what's the difference between find_each and where?
                            
                                Rails .where vs .find
                            
                                Scope with join on :has_many :through association
                            
                                rails error, couldn't parse YAML
                            
                                How to use basic authentication with httparty in a Rails app?
                            
                                Connecting Rails 3.1 with Multiple Databases
                            
                                How do I check if a class is defined?
                            
                                Submit form in rails 3 in an ajax way (with jQuery)
                            
                                how to safely replace all whitespaces with underscores with ruby?
                            
                                When to use self in Model?
                            
                                Why do routes with a dot in a parameter fail to match?
                            
                                Rails - Best-Practice: How to create dependent has_one relations
                            
                                Where to put Ruby helper methods for Rails controllers?
                            
                                Why is my custom rake task in lib/tasks not discovered in Rails 3?
                            
                                Rails: how can I get unique values from column
                            
                                unable to obtain stable firefox connection in 60 seconds (127.0.0.1:7055)
                            
                                "Order by" result of "group by" count?
                            
                                How to define an array / hash in factory_bot?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With