Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding all the users that have duplicate names

I have users which has first_name and last_name fields and i need to do a ruby find all the users that have duplicate accounts based on first and last names. For example i want to have a find that will search through all the other users and find if any have the same name and email. I was thinking a nested loop like this

User.all.each do |user|
 //maybe another loop to search through all the users and maybe if a match occurs put that user in an array
end

Is there a better way

like image 778
Matt Elhotiby Avatar asked Dec 30 '10 17:12

Matt Elhotiby


1 Answers

You could go a long way toward narrowing down your search by finding out what the duplicated data is in the first place. For example, say you want to find each combination of first name and email that is used more than once.

User.find(:all, :group => [:first, :email], :having => "count(*) > 1" )

That will return an array containing one of each of the duplicated records. From that, say one of the returned users had "Fred" and "[email protected]" then you could search for only Users having those values to find all of the affected users.

The return from that find will be something like the following. Note that the array only contains a single record from each set of duplicated users.

[#<User id: 3, first: "foo", last: "barney", email: "[email protected]", created_at: "2010-12-30 17:14:43", updated_at: "2010-12-30 17:14:43">, 
 #<User id: 5, first: "foo1", last: "baasdasdr", email: "[email protected]", created_at: "2010-12-30 17:20:49", updated_at: "2010-12-30 17:20:49">]

For example, the first element in that array shows one user with "foo" and "[email protected]". The rest of them can be pulled out of the database as needed with a find.

> User.find(:all, :conditions => {:email => "[email protected]", :first => "foo"})
 => [#<User id: 1, first: "foo", last: "bar", email: "[email protected]", created_at: "2010-12-30 17:14:28", updated_at: "2010-12-30 17:14:28">, 
     #<User id: 3, first: "foo", last: "barney", email: "[email protected]", created_at: "2010-12-30 17:14:43", updated_at: "2010-12-30 17:14:43">]

And it also seems like you'll want to add some better validation to your code to prevent duplicates in the future.

Edit:

If you need to use the big hammer of find_by_sql, because Rails 2.2 and earlier didn't support :having with find, the following should work and give you the same array that I described above.

User.find_by_sql("select * from users group by first,email having count(*) > 1")
like image 71
jdl Avatar answered Oct 06 '22 01:10

jdl