I have a table which contains about 900K rows.I want to delete about 90% of the rows. Tried using TABLESAMPLE to select them randomly but didn't get much performance improvement. Here are the queries which i have tried and there times
sql> DELETE FROM users WHERE id IN (
SELECT id FROM users ORDER BY random() LIMIT 5000
)
[2017-11-22 11:35:39] 5000 rows affected in 1m 11s 55ms
sql> DELETE FROM users WHERE id IN (
SELECT id FROM users TABLESAMPLE BERNOULLI (5)
)
[2017-11-22 11:55:07] 5845 rows affected in 1m 13s 666ms
sql> DELETE FROM users WHERE id IN (
SELECT id FROM users TABLESAMPLE SYSTEM (5)
)
[2017-11-22 11:57:59] 5486 rows affected in 1m 4s 574ms
Only deleting 5% data takes about an min. So this is going to take very long for large data. Pls suggest if I am doing things right or if there is any better way to do this.
Deleting a large number of rows is always going to be slow. The way how you identify them won't make much difference.
Instead of deleting a large number it's usually a lot faster, to create a new table that contains those rows that you want to keep, e.g.:
create table users_to_keep
as
select *
from users
tablesample system (10);
then truncate the original table and insert the rows that you stored away:
truncate table users;
insert into users
select *
from users_to_keep;
If you want, you can do that in a single transaction.
As a_horse_with_no_name pointed out, the random selection itself is a relatively minor factor. And much of the cost associated with a deletion (e.g. foreign key checks) is not something you can avoid.
The only thing which stands out as an unnecessary overhead is the id
-based lookup in the DELETE
statement; you just visited the row during the random selection step, and now you're looking it up again, presumably via an index on id
.
Instead, you can perform the lookup using the row's physical location, represented by the hidden ctid
column:
DELETE FROM users WHERE ctid = ANY(ARRAY(
SELECT ctid FROM users TABLESAMPLE SYSTEM (5)
))
This gave me a ~6x speedup in an artificial test, though it will likely be dwarfed by other costs in most real-world scenarios.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With