How do I find duplicates across multiple columns?

People also ask

Can Excel find duplicates in two columns?

To highlight duplicate values in two or more columns, you can use conditional formatting with on a formula based on the COUNTIF and AND functions. Both ranges were selected at the same when the rule was created.

Duplicated id for pairs name and city:

select s.id, t.* 
from [stuff] s
join (
    select name, city, count(*) as qty
    from [stuff]
    group by name, city
    having count(*) > 1
) t on s.name = t.name and s.city = t.city

 SELECT name, city, count(*) as qty 
 FROM stuff 
 GROUP BY name, city HAVING count(*)> 1

Something like this will do the trick. Don't know about performance, so do make some tests.

select
  id, name, city
from
  [stuff] s
where
1 < (select count(*) from [stuff] i where i.city = s.city and i.name = s.name)

Using count(*) over(partition by...) provides a simple and efficient means to locate unwanted repetition, whilst also list all affected rows and all wanted columns:

SELECT
    t.*
FROM (
    SELECT
        s.*
      , COUNT(*) OVER (PARTITION BY s.name, s.city) AS qty
    FROM stuff s
    ) t
WHERE t.qty > 1
ORDER BY t.name, t.city

While most recent RDBMS versions support count(*) over(partition by...) MySQL V 8.0 introduced "window functions", as seen below (in MySQL 8.0)

CREATE TABLE stuff(
   id   INTEGER  NOT NULL
  ,name VARCHAR(60) NOT NULL
  ,city VARCHAR(60) NOT NULL
);

INSERT INTO stuff(id,name,city) VALUES 
  (904834,'jim','London')
, (904835,'jim','London')
, (90145,'Fred','Paris')
, (90132,'Fred','Paris')
, (90133,'Fred','Paris')

, (923457,'Barney','New York') # not expected in result
;

SELECT
    t.*
FROM (
    SELECT
        s.*
      , COUNT(*) OVER (PARTITION BY s.name, s.city) AS qty
    FROM stuff s
    ) t
WHERE t.qty > 1
ORDER BY t.name, t.city

    id | name | city   | qty
-----: | :--- | :----- | --:
 90145 | Fred | Paris  |   3
 90132 | Fred | Paris  |   3
 90133 | Fred | Paris  |   3
904834 | jim  | London |   2
904835 | jim  | London |   2

db<>fiddle here

Window functions. MySQL now supports window functions that, for each row from a query, perform a calculation using rows related to that row. These include functions such as RANK(), LAG(), and NTILE(). In addition, several existing aggregate functions now can be used as window functions; for example, SUM() and AVG(). For more information, see Section 12.21, “Window Functions”.

A little late to the game on this post, but I found this way to be pretty flexible / efficient

select 
    s1.id
    ,s1.name
    ,s1.city 
from 
    stuff s1
    ,stuff s2
Where
    s1.id <> s2.id
    and s1.name = s2.name
    and s1.city = s2.city

You have to self join stuff and match name and city. Then group by count.

select 
   s.id, s.name, s.city 
from stuff s join stuff p ON (
   s.name = p.city OR s.city = p.name
)
group by s.name having count(s.name) > 1

Related questions
                            
                                Sort NULL values to the end of a table
                            
                                How to kill/stop a long SQL query immediately?
                            
                                Naming of ID columns in database tables
                            
                                Is the LIKE operator case-sensitive with MSSQL Server?
                            
                                How to force a SQL Server 2008 database to go Offline
                            
                                SQL function as default parameter value?
                            
                                SQL Server: Get data for only the past year
                            
                                Grouping into interval of 5 minutes within a time range
                            
                                Checking for empty or null JToken in a JObject
                            
                                Search of table names
                            
                                Backup a single table with its data from a database in sql server 2008
                            
                                Are Stored Procedures more efficient, in general, than inline statements on modern RDBMS's? [duplicate]
                            
                                Foreign keys in mongo?
                            
                                Calculating Cumulative Sum in PostgreSQL
                            
                                In MySQL queries, why use join instead of where?
                            
                                Is it possible to roll back CREATE TABLE and ALTER TABLE statements in major SQL databases?
                            
                                Postgresql aggregate array
                            
                                alternatives to REPLACE on a text or ntext datatype
                            
                                Optimal way to concatenate/aggregate strings
                            
                                Subtract one day from datetime

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I find duplicates across multiple columns?

Tags:

sql

sql-server

duplicates

sql-server-2008

People also ask

Recent Activity

Donate For Us