Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance of DELETE with NOT IN (SELECT ...)

I have these two tables and want to delete all authors from ms_author, who are not present in author.

author (1.6M rows)

+-------+-------------+------+-----+-------+
| Field | Type        | Null | Key | index |
+-------+-------------+------+-----+-------+
| id    | text        | NO   | PRI | true  |
| name  | text        | YES  |     |       |
+-------+-------------+------+-----+-------+

ms_author (120M rows)

+-------+-------------+------+-----+-------+
| Field | Type        | Null | Key | index |
+-------+-------------+------+-----+-------+
| id    | text        | NO   | PRI |       |
| name  | text        | YES  |     | true  |
+-------+-------------+------+-----+-------+

This is my query:

    DELETE
FROM ms_author AS m
WHERE m.name NOT IN
                   (SELECT a.name
                    FROM author AS a);

I tried to estimate the query duration: ~ 130 hours.
Is there a faster way to achieve this?

EDIT:

EXPLAIN VERBOSE output

Delete on public.ms_author m  (cost=0.00..2906498718724.75 rows=59946100 width=6)"
  ->  Seq Scan on public.ms_author m  (cost=0.00..2906498718724.75 rows=59946100 width=6)"
        Output: m.ctid"
        Filter: (NOT (SubPlan 1))"
        SubPlan 1"
          ->  Materialize  (cost=0.00..44334.43 rows=1660295 width=15)"
                Output: a.name"
                ->  Seq Scan on public.author a  (cost=0.00..27925.95 rows=1660295 width=15)"
                      Output: a.name"

Indexing author(name):

create index author_name on author(name);

Indexing ms_author(name):

create index ms_author_name on ms_author(name);
like image 500
Sebbas Avatar asked Dec 14 '15 09:12

Sebbas


People also ask

Is DELETE faster than select?

Both have huge differences in running time. The select query runs much faster than a similar delete query. Delete does more work than select.

Which is faster not in or not exists?

NOT IN vs NOT EXISTS performance in SQL Server Regarding performance aspects, SQL NOT EXISTS would be a better choice over SQL NOT IN. NOT EXISTS is significantly faster than NOT IN especially when the subquery result is very large.

What happens if you run a DELETE statement with no conditions?

If you run a DELETE statement with no conditions in the WHERE clause, all of the records from the table will be deleted.

What can I use instead of not in SQL?

If so, you should consider using a NOT EXISTS operator instead of NOT IN, or recast the statement as a left outer join.


1 Answers

I'm a big fan of the "anti-join." This works efficiently for both large and small datasets:

delete from ms_author ma
where not exists (
  select null
  from author a
  where ma.name = a.name
)
like image 82
Hambone Avatar answered Oct 03 '22 20:10

Hambone