I have a substantial database... not a very large one - around 1gb of data in total.
I need to delete some rows from several tables. For example I have a table
Order
id | ... | status | ...
1 | ... | 1 | ...
...
40 | ... | 20 | ...
41 | ... | 1 | ...
...
470000 | ... | 12 | ...
Now I want to delete all orders that have status=1
I presume I do it with:
DELETE FROM Order WHERE status=1
All nice and simple it would seem but it takes ages! When I have run this query it was still running at 100% CPU usage after 40 min... when I killed the process nothing was deleted.
When I tried limiting the scope by using
DELETE FROM Order WHERE status=1 AND id<1000
it took couple of minutes to delete some 200 rows....
Is there anything I am missing in my configuration? Anything I should look for/check/change? Any ideas at all why its so bloody inefficient?
Let me add that I am usually working with MySQL and need to manage this postgres database but don't really have any experience with postgres, so it may be something very simple.
Indexes are on both id and status columns.
Table has some 500k rows, around half needs to be deleted.
Execution plan:
Delete (cost=0.00..19474.19 rows=266518 width=6)
-> Seq Scan on Orders (cost=0.00..19474.19 rows=266518 width=6)
Filter: (statusid = 1)
There are no triggers or rules of any sort. What's more, I didn't add this is a fresh copy of the table, I mean it was moved from other server with export/import. Perhaps this plays a role somehow?
Will deleting indexes help?
Slow DELETEs are almost always caused by missing indexes on foreign keys that directly or indirectly reference the table being deleted from.
TRUNCATE provides a faster mechanism to remove all rows from a table. There are two ways to delete rows in a table using information contained in other tables in the database: using sub-selects, or specifying additional tables in the USING clause.
If you update a table, the system has to maintain those indexes that are on the columns being updated. So having a lot of indexes can speed up select statements, but slow down inserts, updates, and deletes.
Having nothing deleted after you killed the process is EXACTLY what you should see.
The delete happens as a transaction, meaning that either everything is deleted, or nothing is. To make sure that this can happen, the rows need to be copied somewhere before deleting them. Which means that deleting 250k rows takes about as much time as inserting that many rows. In some cases it can be faster to create a new table with everything NOT deleted and rename the new table to the old one.
If this is being moved from another db, you will probably be better off if you can keep the 250k rows from being inserted in the first place.
(This is general RDBMS wisdom, not postgresql specific - the details of how postgres MVCC works may differ significantly.)
It is much quicker use COPY FROM / TURNCATE /COPY TO on big tables.
But, of course, you have to be careful with references and if possible, disable triggers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With