I have the following structure for the table DataTable
: every column is of the datatype int, RowID
is an identity column and the primary key. LinkID
is a foreign key and links to rows of an other table.
RowID LinkID Order Data DataSpecifier
1 120 1 1 1
2 120 2 1 3
3 120 3 1 10
4 120 4 1 13
5 120 5 1 10
6 120 6 1 13
7 371 1 6 2
8 371 2 3 5
9 371 3 8 1
10 371 4 10 1
11 371 5 7 2
12 371 6 3 3
13 371 7 7 2
14 371 8 17 4
.................................
.................................
I'm trying to do a query which alters every LinkID
batch in the following way:
LinkID
(e.g. the first batch is the first 6 rows here)Order
columnData
and DataSpecifier
columns as one compare unit (They can be thought as one column, called dataunit
):
Order
1 onwards, until a duplicate dataunit
comes byLinkID
So for the LinkID
120
:
Order=1
here), go as long as you don't see a duplicate.Order = 5
(dataunit
1 10
was already seen).LinkID=120 AND Order>=5
After similar process for LinkID
371
(and every other LinkID
in the table), the processed table will look like this:
RowID LinkID Order Data DataSpecifier
1 120 1 1 1
2 120 2 1 3
3 120 3 1 10
4 120 4 1 13
7 371 1 6 2
8 371 2 3 5
9 371 3 8 1
10 371 4 10 1
11 371 5 7 2
12 371 6 3 3
.................................
.................................
I've done quite a lot of SQL queries, but never something this complicated. I know I need to use a query which is something like this:
DELETE FROM DataTable
WHERE RowID IN (SELECT RowID
FROM DataTable
WHERE -- ?
GROUP BY LinkID
HAVING COUNT(*) > 1 -- ?
ORDER BY [Order]);
But I just can't seem to wrap my head around this and get the query right. I would preferably do this in pure SQL, with one executable (and reusable) query.
SQL Delete Duplicate Rows using Group By and Having Clause According to Delete Duplicate Rows in SQL, for finding duplicate rows, you need to use the SQL GROUP BY clause. The COUNT function can be used to verify the occurrence of a row using the Group by clause, which groups data according to the given columns.
This method will introduce the Remove Duplicates feature to remove entire rows based on duplicates in one column easily in Excel. 1. Select the range you will delete rows based on duplicates in one column, and then click Data > Remove Duplicates.
We can try using a CTE here to make things easier:
WITH cte AS (
SELECT *,
COUNT(*) OVER (PARTITION BY LinkID, Data, DataSpecifier ORDER BY [Order]) - 1 cnt
FROM DataTable
),
cte2 AS (
SELECT *,
SUM(cnt) OVER (PARTITION BY LinkID ORDER BY [Order]) num
FROM cte
)
DELETE
FROM cte
WHERE num > 0;
The logic here is to use COUNT
as an analytic function to identify the duplicate records. We use a partition of LinkID
along with Data
and DataSpecifier
. Any record with an Order
value greater than or equal to the first record with a non zero count is then targeted for deletion.
Here is a demo showing that the logic of the CTE is correct:
You can use the ROW_NUMBER()
window function to identify any rows that come after the original. After that you can delete and rows with a matching LinkID
and a greater than or equal to any encountered Order
with a row number greater than one.
(I originally used a second CTE to get the MIN order
, but I realized that it wasn't necessary as long as the join to order
was greater than equal to any order
where there was a second instance of the DataUnitId. By removing the MIN
the query plan became quite simple and efficient.)
WITH DataUnitInstances AS (
SELECT *
, ROW_NUMBER() OVER
(PARTITION BY LinkID, [Data], [DataSpecifier] ORDER BY [Order]) DataUnitInstanceId
FROM DataTable
)
DELETE FROM DataTable
FROM DataTable dt
INNER JOIN DataUnitInstances dup ON dup.LinkID = dt.LinkID
AND dup.[Order] <= dt.[Order]
AND dup.DataUnitInstanceId > 1
Here is the output from your sample data which matches your desired result:
+-------+--------+-------+------+---------------+
| RowID | LinkID | Order | Data | DataSpecifier |
+-------+--------+-------+------+---------------+
| 1 | 120 | 1 | 1 | 1 |
| 2 | 120 | 2 | 1 | 3 |
| 3 | 120 | 3 | 1 | 10 |
| 4 | 120 | 4 | 1 | 13 |
| 7 | 371 | 1 | 6 | 2 |
| 8 | 371 | 2 | 3 | 5 |
| 9 | 371 | 3 | 8 | 1 |
| 10 | 371 | 4 | 10 | 1 |
| 11 | 371 | 5 | 7 | 2 |
| 12 | 371 | 6 | 3 | 3 |
+-------+--------+-------+------+---------------+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With