I now have a table that has 604 000 row. I would like to drop 4000 random rows so my table will only contains 600 000 entries.
Would there be a quick way to do so ?
Thanks a lot.
We can use DELETE statement along with a WHERE clause, which identifies those multiple rows, to delete multiple rows from MySQL table.
First, you specify the name of the table from which the rows are to be deleted in the FROM clause. Second, to specify the number or percent of random rows that will be deleted, you use the TOP clause.
To delete rows in a MySQL table, use the DELETE FROM statement: DELETE FROM products WHERE product_id=1; The WHERE clause is optional, but you'll usually want it, unless you really want to delete every row from the table.
In theory, this will be random and fast. In practise, it will be only fast:
DELETE FROM tableX
LIMIT 4000
This will be random but terribly slow, with 600K rows:
DELETE FROM tableX
ORDER BY RAND()
LIMIT 4000
This won't be truly random (as there are usually gaps in the ids) and it may not even delete exactly 4000 rows (but a few less when there are many gaps) but it's probably faster than the previous.
An extra wrapping in a subquery is needed because the syntax for Delete from multiple tables does not allow LIMIT
:
DELETE td
FROM
tableX AS td
JOIN
( SELECT t.id
FROM
tableX AS t
CROSS JOIN
( SELECT MAX(id) AS maxid
FROM tableX
) AS m
JOIN
( SELECT RAND() AS rndm
FROM tableX AS tr
LIMIT 5000
) AS r
ON
t.id = CEIL( rndm * maxid )
LIMIT 4000
) AS x
ON
x.id = td.id
Explain output (of the subquery, from a 400K rows table):
id table possible_keys key_len rows
select_type type key ref Extra
1 PRIMARY <derived2> system 1
1 PRIMARY <derived3> ALL 5000
1 PRIMARY t eq_ref PRIMARY PRIMARY 4 func 1 Using where;Using index
3 DERIVED tr index PRIMARY 4 398681 Using index
2 DERIVED Select tables optimized away
delete from yourTable limit 4000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With