Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I find duplicate entries in a database table?

Tags:

The following query will display all Dewey Decimal numbers that have been duplicated in the "book" table:

SELECT dewey_number,   COUNT(dewey_number) AS NumOccurrences FROM book GROUP BY dewey_number HAVING ( COUNT(dewey_number) > 1 ) 

However, what I'd like to do is have my query display the name of the authors associated with the duplicated entry (the "book" table and "author" table are connected by "author_id"). In other words, the query above would yield the following:

dewey_number | NumOccurrences ------------------------------ 5000         | 2 9090         | 3 

What I'd like the results to display is something similar to the following:

author_last_name | dewey_number | NumOccurrences ------------------------------------------------- Smith            | 5000         | 2 Jones            | 5000         | 2 Jackson          | 9090         | 3 Johnson          | 9090         | 3 Jeffers          | 9090         | 3 

Any help you can provide is greatly appreciated. And, in case it comes into play, I'm using a Postgresql DB.

UPDATE: Please note that "author_last_name" is not in the "book" table.

like image 742
Huuuze Avatar asked Jan 12 '09 16:01

Huuuze


People also ask

How do I select duplicate records in SQL?

To select duplicate values, you need to create groups of rows with the same values and then select the groups with counts greater than one. You can achieve that by using GROUP BY and a HAVING clause.

How do I find duplicate names in a table in SQL?

To find the duplicate Names in the table, we have to follow these steps: Defining the criteria: At first, you need to define the criteria for finding the duplicate Names. You might want to search in a single column or more than that. Write the query: Then simply write the query to find the duplicate Names.

How do you identify duplicate records in a table and delete them?

To delete the duplicate rows from the table in SQL Server, you follow these steps: Find duplicate rows using GROUP BY clause or ROW_NUMBER() function. Use DELETE statement to remove the duplicate rows.


1 Answers

A nested query can do the job.

SELECT author_last_name, dewey_number, NumOccurrences FROM author INNER JOIN      ( SELECT author_id, dewey_number,  COUNT(dewey_number) AS NumOccurrences         FROM book         GROUP BY author_id, dewey_number         HAVING ( COUNT(dewey_number) > 1 ) ) AS duplicates ON author.id = duplicates.author_id 

(I don't know if this is the fastest way to achieve what you want.)

Update: Here is my data

SELECT * FROM author;  id | author_last_name  ----+------------------   1 | Fowler   2 | Knuth   3 | Lang  SELECT * FROM book;  id | author_id | dewey_number |         title           ----+-----------+--------------+------------------------   1 |         1 |          600 | Refactoring   2 |         1 |          600 | Refactoring   3 |         1 |          600 | Analysis Patterns   4 |         2 |          600 | TAOCP vol. 1   5 |         2 |          600 | TAOCP vol. 1   6 |         2 |          600 | TAOCP vol. 2   7 |         3 |          500 | Algebra   8 |         3 |          500 | Undergraduate Analysis   9 |         1 |          600 | Refactoring  10 |         2 |          500 | Concrete Mathematics  11 |         2 |          500 | Concrete Mathematics  12 |         2 |          500 | Concrete Mathematics 

And here is the result of the above query:

 author_last_name | dewey_number | numoccurrences  ------------------+--------------+----------------  Fowler           |          600 |              4  Knuth            |          600 |              3  Knuth            |          500 |              3  Lang             |          500 |              2 
like image 62
Federico A. Ramponi Avatar answered Oct 11 '22 12:10

Federico A. Ramponi