Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding duplicate values in a SQL table

Tags:

sql

duplicates

It's easy to find duplicates with one field:

SELECT email, COUNT(email)  FROM users GROUP BY email HAVING COUNT(email) > 1 

So if we have a table

ID   NAME   EMAIL 1    John   [email protected] 2    Sam    [email protected] 3    Tom    [email protected] 4    Bob    [email protected] 5    Tom    [email protected] 

This query will give us John, Sam, Tom, Tom because they all have the same email.

However, what I want is to get duplicates with the same email and name.

That is, I want to get "Tom", "Tom".

The reason I need this: I made a mistake, and allowed inserting duplicate name and email values. Now I need to remove/change the duplicates, so I need to find them first.

like image 671
Alex Avatar asked Apr 07 '10 18:04

Alex


People also ask

How can we find duplicate records in a table?

One way to find duplicate records from the table is the GROUP BY statement. The GROUP BY statement in SQL is used to arrange identical data into groups with the help of some functions. i.e if a particular column has the same values in different rows then it will arrange these rows in a group.

How do I find duplicate names in a table in SQL?

To find the duplicate Names in the table, we have to follow these steps: Defining the criteria: At first, you need to define the criteria for finding the duplicate Names. You might want to search in a single column or more than that. Write the query: Then simply write the query to find the duplicate Names.

How do I filter duplicate records in SQL?

The go to solution for removing duplicate rows from your result sets is to include the distinct keyword in your select statement. It tells the query engine to remove duplicates to produce a result set in which every row is unique.


1 Answers

SELECT     name, email, COUNT(*) FROM     users GROUP BY     name, email HAVING      COUNT(*) > 1 

Simply group on both of the columns.

Note: the older ANSI standard is to have all non-aggregated columns in the GROUP BY but this has changed with the idea of "functional dependency":

In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, functional dependency is a constraint that describes the relationship between attributes in a relation.

Support is not consistent:

  • Recent PostgreSQL supports it.
  • SQL Server (as at SQL Server 2017) still requires all non-aggregated columns in the GROUP BY.
  • MySQL is unpredictable and you need sql_mode=only_full_group_by:
    • GROUP BY lname ORDER BY showing wrong results;
    • Which is the least expensive aggregate function in the absence of ANY() (see comments in accepted answer).
  • Oracle isn't mainstream enough (warning: humour, I don't know about Oracle).
like image 175
gbn Avatar answered Dec 03 '22 04:12

gbn