Remove duplicates with less null values

Tags:

sql-server

I have a table of employees which contains about 25 columns. Right now there are a lot of duplicates and I would like to try and get rid of some of these duplicates.

First, I want to find the duplicates by looking for multiple records that have the same values in first name, last name, employee number, company number and status.

SELECT
    firstname,lastname,employeenumber, companynumber, statusflag
FROM
    employeemaster
GROUP BY
    firstname,lastname,employeenumber,companynumber, statusflag
HAVING 
    (COUNT(*) > 1)

This gives me duplicates but my goal is to find and keep the best single record and delete the other records. The "best single record" is defined by the record with the least amount of NULL values in all of the other columns. How can I do this?

I am using Microsoft SQL Server 2012 MGMT Studio.

EXAMPLE:

enter image description here

Red: DELETE Green: KEEP

NOTE: There are a lot more columns in the table than what this table shows.

251

asked Jan 13 '15 16:01

user3788671

2 Answers

You can use the sys.columns table to get a list of columns and build a dynamic query. This query will return a 'KeepThese' value for every record you want to keep based on your given criteria.

-- insert test data
create table EmployeeMaster
  (
    Record int identity(1,1),
    FirstName varchar(50),
    LastName varchar(50),
    EmployeeNumber int,
    CompanyNumber int,
    StatusFlag int,
    UserName varchar(50),
    Branch varchar(50)
  );
insert into EmployeeMaster
  (
    FirstName,
    LastName,
    EmployeeNumber,
    CompanyNumber,
    StatusFlag,
    UserName,
    Branch
  )
  values
    ('Jake','Jones',1234,1,1,'JJONES','PHX'),
    ('Jake','Jones',1234,1,1,NULL,'PHX'),
    ('Jake','Jones',1234,1,1,NULL,NULL),
    ('Jane','Jones',5678,1,1,'JJONES2',NULL);

-- get records with most non-null values with dynamic sys.column query
declare @sql varchar(max)
select @sql = '
    select e.*,
        row_number() over(partition by
                            e.FirstName,
                            e.LastName,
                            e.EmployeeNumber,
                            e.CompanyNumber,
                            e.StatusFlag
                          order by n.NonNullCnt desc) as KeepThese
    from EmployeeMaster e
        cross apply (select count(n.value) as NonNullCnt from (select ' +
            replace((
                select 'cast(' + c.name + ' as varchar(50)) as value union all select '
                from sys.columns c
                where c.object_id = t.object_id
                for xml path('')
                ) + '#',' union all select #','') + ')n)n'
from sys.tables t
where t.name = 'EmployeeMaster'

exec(@sql)

126

answered Sep 24 '22 04:09

Ron Smith

Try this.

;WITH cte
     AS (SELECT Row_number()
                  OVER(
                    partition BY firstname, lastname, employeenumber, companynumber, statusflag
                    ORDER BY (SELECT NULL)) rn,
                firstname,
                lastname,
                employeenumber,
                companynumber,
                statusflag,
                username,
                branch
         FROM   employeemaster),
     cte1
     AS (SELECT a.firstname,
                a.lastname,
                a.employeenumber,
                a.companynumber,
                a.statusflag,
                Row_number()
                  OVER(
                    partition BY a.firstname, a.lastname, a.employeenumber, a.companynumber, a.statusflag
                    ORDER BY (CASE WHEN a.username IS NULL THEN 1 ELSE 0 END +CASE WHEN a.branch IS NULL THEN 1 ELSE 0 END) )rn
                        -- add the remaining columns in case statement
         FROM   cte a
                JOIN employeemaster b
                  ON a.firstname = b.firstname
                     AND a.lastname = b.lastname
                     AND a.employeenumber = b.employeenumber
                     AND a.companynumbe = b.companynumber
                     AND a.statusflag = b.statusflag)
SELECT *
FROM   cte1
WHERE  rn = 1

answered Sep 25 '22 04:09

Pரதீப்

Related questions
                            
                                MySQL index slowing down query
                            
                                How can I convert this SQL Query into LINQ (OVER (PARTITION BY Date))
                            
                                Mysql auto-add prefixes to fields
                            
                                poor Hibernate select performance comparing to running directly - how debug?
                            
                                Defining the sort order of children in a hierarchy query
                            
                                Group the rows that are having the same value in specific field in MySQL
                            
                                select subquery inside then of case when statement?
                            
                                SQL Server Create Table with Foreign Key
                            
                                how to return the average of a sql time field
                            
                                How to retrieve table and column names from SQl using JSQLPARSE
                            
                                sql server- When does table get locked when updating with join
                            
                                Merge data into two destination tables
                            
                                Storing partial dates in a database
                            
                                CakePHP 1.3 - Unknown column in where clause
                            
                                Laravel 4 query builder - with complicated left joins
                            
                                PostgreSQL - Create table and set specific date format
                            
                                Hibernate lazy loading not work with many-to-one mapping
                            
                                Microsoft SQL Server backup physical_device_name
                            
                                How to use a group by Sum SQL with Spring Data JPA?
                            
                                Invalid Identifier on Sql left join oracle

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With