I am working with a table which is an extract of a set of other tables. All of the rows of the extract table should be unique according to keys D1, D2 and D3. They are not. It appears that an earlier developer attempted to solve this problem by using a SELECT DISTINCT
across all columns being queried from this table. This will work, but only if every row which is a duplicate on (D1, D2, D3) is also a duplicate across the non-key columns (ignoring the IDENTITY column that was added to the extract table).
In other words, given rows as follows:
D1 D2 D3 C4 C5 C6
=== === === === === ===
A B C X1 X2 X3
A B C X1 X2 X3
then
SELECT DISTINCT D1, D2, D3, C4, C5, C6
FROM BAD_TABLE
will "work", as there's no difference between the rows which are duplicated on (D1,D2,D3). But if the table contained
D1 D2 D3 C4 C5 C6
=== === === === === ===
A B C X1 X2 X3
A B C X1 X2 X4
then SELECT DISTINCT would return two rows for the key (A,B,C). Furthermore, we would have to decide which of X3 or X4 was the "correct" value.
I know how to find the duplicates on (D1,D2,D3). I even know how to find the duplicates across all the columns (other than the IDENTITY column):
;
WITH DUPLICATES(D1,D2,D3) AS
(
SELECT D1, D2, D3
FROM SOURCE
GROUP BY D1, D2, D3
HAVING COUNT(*)>1
)
SELECT S.D1, S.D2, S.D3, S.C4, S.C5, S.C6
FROM SOURCE S
INNER JOIN DUPLICATES D
ON S.D1 = D.D1 AND S.D2 = D.D2 AND S.D3 = D.D3
ORDER BY S.D1, S.D2, S.D3, S.C4, S.C5, S.C6
The question is, how do I find the subset of the above resultset which are duplicates on (D1,D2,D3), but not duplicates on (D1,D2,D3,C4,C5,C6)?
In Excel, there are several ways to filter for unique values—or remove duplicate values: To filter for unique values, click Data > Sort & Filter > Advanced. To remove duplicate values, click Data > Data Tools > Remove Duplicates.
You can do it by joining the table on itself, saying the D's are all equal and at least one of the C's are not equal.
CREATE TABLE #Source (
D1 VARCHAR(2),
D2 VARCHAR(2),
D3 VARCHAR(2),
C4 VARCHAR(2),
C5 VARCHAR(2),
C6 VARCHAR(2) );
INSERT INTO #Source VALUES ('A', 'B', 'C', 'X1', 'X2', 'X3');
INSERT INTO #Source VALUES ('A', 'B', 'C', 'X1', 'X2', 'X4');
INSERT INTO #Source VALUES ('A', 'B', 'D', 'X1', 'X2', 'X3');
INSERT INTO #Source VALUES ('A', 'B', 'D', 'X1', 'X2', 'X3');
SELECT S1.D1, S1.D2, S1.D3, S1.C4 C4_1, S2.C4 C4_2, S1.C5 C5_1, S2.C5 C5_2, S1.C6 C6_1, S2.C6 C6_2
FROM
#Source S1
INNER JOIN
#Source S2
ON
( S1.D1 = S2.D1
AND S1.D2 = S2.D2
AND S1.D3 = S2.D3
AND ( S1.C4 <> S2.C4
OR S1.C5 <> S2.C5
OR S1.C6 <> S2.C6
)
);
DROP TABLE #Source;
Gives the following results:
D1 D2 D3 C4_1 C4_2 C5_1 C5_2 C6_1 C6_2
---- ---- ---- ---- ---- ---- ---- ---- ----
A B C X1 X1 X2 X2 X4 X3
A B C X1 X1 X2 X2 X3 X4
Also note that this is compatible with MS SQL 2000 as you later indicated is required in How to Convert a SQL Query using Common Table Expressions to One Without (for SQL Server 2000).
I haven't had a chance to try Conrad's answer yet, but came up with one of my own. It's rather a "duh" moment.
So, if you want to find all the rows in set A except for those that are in set B, you use the EXCEPT operator:
;
WITH KEYDUPLICATES(D1,D2,D3) AS
(
SELECT D1, D2, D3
FROM SOURCE
GROUP BY D1, D2, D3
HAVING COUNT(*)>1
),
KEYDUPLICATEROWS AS
(
SELECT S.D1, S.D2, S.D3, S.C4, S.C5, S.C6
FROM SOURCE S
INNER JOIN KEYDUPLICATES D
ON S.D1 = D.D1 AND S.D2 = D.D2 AND S.D3 = D.D3
),
FULLDUPLICATES AS
(
SELECT S.D1, S.D2, S.D3, S.C4, S.C5, S.C6
FROM SOURCE S
GROUP BY S.D1, S.D2, S.D3, S.C4, S.C5, S.C6
HAVING COUNT(*)>1
)
SELECT KR.D1, KR.D2, KR.D3, KR.C4, KR.C5, KR.C6
FROM KEYDUPLICATEROWS AS KR
EXCEPT
SELECT FD.D1, FD.D2, FD.D3, FD.C4, FD.C5, FD.C6
FROM FULLDUPLICATES AS FD
ORDER BY D1, D2, D3, C4, C5, C6
This seems to be showing me 1500 rows which are duplicates across (D1,D2,D3), but which are only duplicates across a subset of (D1,D2,D3,C4,C5,C6). In fact, it appears they are duplicates across (D1,D2,D3,C4,C5).
How to confirm that will be the subject of another question.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With