Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest "Get Duplicates" SQL script

What is an example of a fast SQL to get duplicates in datasets with hundreds of thousands of records. I typically use something like:

SELECT afield1, afield2 FROM afile a  WHERE 1 < (SELECT count(afield1) FROM afile b WHERE a.afield1 = b.afield1); 

But this is quite slow.

like image 628
Johan Bresler Avatar asked Oct 13 '08 09:10

Johan Bresler


2 Answers

This is the more direct way:

select afield1,count(afield1) from atable  group by afield1 having count(afield1) > 1 
like image 171
Vinko Vrsalovic Avatar answered Sep 28 '22 07:09

Vinko Vrsalovic


You could try:

select afield1, afield2 from afile a where afield1 in ( select afield1   from afile   group by afield1   having count(*) > 1 ); 
like image 37
Tony Andrews Avatar answered Sep 28 '22 07:09

Tony Andrews