I'm using Heroku to host my Ruby on Rails application and for one reason or another, I may have some duplicate rows.
Is there a way to delete duplicate records based on 2 or more criteria but keep just 1 record of that duplicate collection?
In my use case, I have a Make and Model relationship for cars in my database.
Make Model --- --- Name Name Year Trim MakeId
I'd like to delete all Model records that have the same Name, Year and Trim but keep 1 of those records (meaning, I need the record but only once). I'm using Heroku console so I can run some active record queries easily.
Any suggestions?
In SQL, some rows contain duplicate entries in multiple columns(>1). For deleting such rows, we need to use the DELETE keyword along with self-joining the table with itself.
Delete Duplicate Rows based on Specific Columns To delete duplicate rows on the basis of multiple columns, specify all column names as a list. You can set 'keep=False' in the drop_duplicates() function to remove all the duplicate rows.
class Model def self.dedupe # find all models and group them on keys which should be common grouped = all.group_by{|model| [model.name,model.year,model.trim,model.make_id] } grouped.values.each do |duplicates| # the first one we want to keep right? first_one = duplicates.shift # or pop for last one # if there are any more left, they are duplicates # so delete all of them duplicates.each{|double| double.destroy} # duplicates can now be destroyed end end end Model.dedupe
If your User table data like below
User.all => [ #<User id: 15, name: "a", email: "[email protected]", created_at: "2013-08-06 08:57:09", updated_at: "2013-08-06 08:57:09">, #<User id: 16, name: "a1", email: "[email protected]", created_at: "2013-08-06 08:57:20", updated_at: "2013-08-06 08:57:20">, #<User id: 17, name: "b", email: "[email protected]", created_at: "2013-08-06 08:57:28", updated_at: "2013-08-06 08:57:28">, #<User id: 18, name: "b1", email: "[email protected]", created_at: "2013-08-06 08:57:35", updated_at: "2013-08-06 08:57:35">, #<User id: 19, name: "b11", email: "[email protected]", created_at: "2013-08-06 09:01:30", updated_at: "2013-08-06 09:01:30">, #<User id: 20, name: "b11", email: "[email protected]", created_at: "2013-08-06 09:07:58", updated_at: "2013-08-06 09:07:58">] 1.9.2p290 :099 >
Email id's are duplicate, so our aim is to remove all duplicate email ids from user table.
Step 1:
To get all distinct email records id.
ids = User.select("MIN(id) as id").group(:email,:name).collect(&:id) => [15, 16, 18, 19, 17]
Step 2:
To remove duplicate id's from user table with distinct email records id.
Now the ids array holds the following ids.
[15, 16, 18, 19, 17] User.where("id NOT IN (?)",ids) # To get all duplicate records User.where("id NOT IN (?)",ids).destroy_all
** RAILS 4 **
ActiveRecord 4 introduces the .not
method which allows you to write the following in Step 2:
User.where.not(id: ids).destroy_all
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With