I have these two tables and want to delete all authors from ms_author, who are not present in author.
author
(1.6M rows)
+-------+-------------+------+-----+-------+
| Field | Type | Null | Key | index |
+-------+-------------+------+-----+-------+
| id | text | NO | PRI | true |
| name | text | YES | | |
+-------+-------------+------+-----+-------+
ms_author
(120M rows)
+-------+-------------+------+-----+-------+
| Field | Type | Null | Key | index |
+-------+-------------+------+-----+-------+
| id | text | NO | PRI | |
| name | text | YES | | true |
+-------+-------------+------+-----+-------+
This is my query:
DELETE
FROM ms_author AS m
WHERE m.name NOT IN
(SELECT a.name
FROM author AS a);
I tried to estimate the query duration: ~ 130 hours.
Is there a faster way to achieve this?
EDIT:
EXPLAIN VERBOSE
output
Delete on public.ms_author m (cost=0.00..2906498718724.75 rows=59946100 width=6)"
-> Seq Scan on public.ms_author m (cost=0.00..2906498718724.75 rows=59946100 width=6)"
Output: m.ctid"
Filter: (NOT (SubPlan 1))"
SubPlan 1"
-> Materialize (cost=0.00..44334.43 rows=1660295 width=15)"
Output: a.name"
-> Seq Scan on public.author a (cost=0.00..27925.95 rows=1660295 width=15)"
Output: a.name"
Indexing author(name
):
create index author_name on author(name);
Indexing ms_author(name):
create index ms_author_name on ms_author(name);
Both have huge differences in running time. The select query runs much faster than a similar delete query. Delete does more work than select.
NOT IN vs NOT EXISTS performance in SQL Server Regarding performance aspects, SQL NOT EXISTS would be a better choice over SQL NOT IN. NOT EXISTS is significantly faster than NOT IN especially when the subquery result is very large.
If you run a DELETE statement with no conditions in the WHERE clause, all of the records from the table will be deleted.
If so, you should consider using a NOT EXISTS operator instead of NOT IN, or recast the statement as a left outer join.
I'm a big fan of the "anti-join." This works efficiently for both large and small datasets:
delete from ms_author ma
where not exists (
select null
from author a
where ma.name = a.name
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With