Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find records that have duplicate data using Active Record

What is the best way to find records with duplicate values in a column using ruby and the new Activerecord?

like image 731
srboisvert Avatar asked Feb 24 '11 14:02

srboisvert


People also ask

How can you filter duplicate data while Retriving records from a table?

Once you have grouped data you can filter out duplicates by using having clause. Having clause is the counterpart of where clause for aggregation queries. Just remember to provide a temporary name to count() data in order to use them in having clause.


6 Answers

Translating @TuteC into ActiveRecord:

sql = 'SELECT id, 
         COUNT(id) as quantity 
         FROM types 
         GROUP BY name 
       HAVING quantity > 1'
#=>
Type.select("id, count(id) as quantity")
  .group(:name)
  .having("quantity > 1")
like image 194
fl00r Avatar answered Oct 03 '22 13:10

fl00r


Here's how I solved it with the AREL helpers, and no custom SQL:

Person.select("COUNT(last_name) as total, last_name")
  .group(:last_name)
  .having("COUNT(last_name) > 1")
  .order(:last_name)
  .map{|p| {p.last_name => p.total} }

Really, it's just a nicer way to write the SQL. This finds all records that have duplicate last_name values, and tells you how many and what the last names are in a nice hash.

like image 28
brookr Avatar answered Oct 03 '22 13:10

brookr


I was beating my head against this problem with a 2016 stack (Rails 4.2, Ruby 2.2), and got what I wanted with this:

> Model.select([:thing]).group(:thing).having("count(thing) > 1").all.size
 => {"name1"=>5, "name2"=>4, "name3"=>3, "name4"=>2, "name5"=>2}
like image 31
Sam Avatar answered Oct 03 '22 12:10

Sam


With custom SQL, this finds types with same values for name:

sql = 'SELECT id, COUNT(id) as quantity FROM types
         GROUP BY name HAVING quantity > 1'
repeated = ActiveRecord::Base.connection.execute(sql)
like image 45
TuteC Avatar answered Oct 03 '22 14:10

TuteC


In Rails 2.x, select is a private method of AR class. Just use find():

klass.find(:all, 
  :select => "id, count(the_col) as num", 
  :conditions => ["extra conditions here"], 
  :group => 'the_col', 
  :having => "num > 1")
like image 30
simianarmy Avatar answered Oct 03 '22 13:10

simianarmy


Here is a solution that extends the other answers to show how to find and iterate through the records grouped by the duplicate field:

duplicate_values = Model.group(:field).having(Model.arel_table[:field].count.gt(1)).count.keys
Model.where(field: duplicate_values).group_by(&:field).each do |value, records|
  puts "The records with ids #{records.map(&:id).to_sentence} have field set to #{value}"
end

It seems a shame this has to be done with two queries but this answer confirms this approach.

like image 42
eremite Avatar answered Oct 03 '22 14:10

eremite