As the title says, I've been running this command to delete some records from a table but I noticed that it wasn't deleting the same records listed if I ran a query selecting the TOP(X)
records right before.
Shouldn't the DELETE
command follow the same default ordering that SELECT
does? If not, why?
I don't know if it matters, but here is the delete command I'm running:
DELETE TOP (100000) t
FROM Ticket AS t
LEFT JOIN Series s on t.SeriesId = s.id
LEFT JOIN Payment p on t.id = p.TicketId
WHERE s.EndDate <= DATEADD(MONTH, -1, GETDATE()) AND t.ExportDate is null AND p.TicketId is null
For the select, just replace the first line with SELECT TOP(100000) t.*
Without an ORDER BY, the order of the data CANNOT be guaranteed.
Nope. There is no "default ordering". If you don't specify an ORDER BY
any ordering is valid.
The resulting order will be execution plan dependant and may not even be stable within a given execution plan (e.g. parallelism may distribute rows to different threads with the exact distribution depending on how busy each scheduler was at the time)
Moreover even ignoring the ORDER BY
issue the semantics of the TOP
are different for the SELECT
and DELETE
. If there are multiple matches in Series
for a given Ticket
(and matching the EndDate
) the SELECT
will count the joined rows whereas the DELETE
will collapse these down to 1 before the delete and just count the rows deleted from Ticket
.
To do the DELETE
in a deterministic manner you can first construct a SELECT
statement that is valid for this purpose (with all joins replaced with EXISTS
/NON EXISTS
) and then you can put that in a CTE and delete from that.
WITH CTE
AS (SELECT TOP (100000) t.*
FROM Ticket AS t
WHERE t.ExportDate IS NULL
AND EXISTS (SELECT *
FROM Series s
WHERE t.SeriesId = s.id
AND s.EndDate <= DATEADD(MONTH, -1, GETDATE()))
AND NOT EXISTS(SELECT *
FROM Payment p
WHERE t.id = p.TicketId)
ORDER BY t.id)
DELETE FROM CTE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With