What is an example of a fast SQL to get duplicates in datasets with hundreds of thousands of records. I typically use something like: <pre class="prettyprint"><code>SELECT afield1, afield2 FROM afile a WHERE 1 < (SELECT count(afield1) FROM afile b WHERE a.afield1 = b.afield1); </code></pre> But this is quite slow.

This is the more direct way: <pre class="prettyprint"><code>select afield1,count(afield1) from atable group by afield1 having count(afield1) > 1 </code></pre>

Fastest "Get Duplicates" SQL script

Tags:

performance

sql

scripting

duplicates

What is an example of a fast SQL to get duplicates in datasets with hundreds of thousands of records. I typically use something like:

SELECT afield1, afield2 FROM afile a  WHERE 1 < (SELECT count(afield1) FROM afile b WHERE a.afield1 = b.afield1);

But this is quite slow.

628

asked Oct 13 '08 09:10

Johan Bresler

2 Answers

This is the more direct way:

select afield1,count(afield1) from atable  group by afield1 having count(afield1) > 1

171

answered Sep 28 '22 07:09

Vinko Vrsalovic

You could try:

select afield1, afield2 from afile a where afield1 in ( select afield1   from afile   group by afield1   having count(*) > 1 );

answered Sep 28 '22 07:09

Tony Andrews

Recent Activity
Apple Pay - authorize.net returns error 153 only when live, sandbox works
How to continue cursor loop even error occured in the loop
python find all neighbours of a given node in a list of lists
Fatal error: Call to a member function setColumn() on a non-object in Magento
Count how many of each value from a field with MySQL and PHP
Python 32-bit development on 64-bit Windows [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With