Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mysql Deleting multiple random row from a table

Tags:

php

mysql

I now have a table that has 604 000 row. I would like to drop 4000 random rows so my table will only contains 600 000 entries.

Would there be a quick way to do so ?

Thanks a lot.

like image 980
silkAdmin Avatar asked Jan 26 '12 10:01

silkAdmin


People also ask

Can we delete multiple rows in MySQL?

We can use DELETE statement along with a WHERE clause, which identifies those multiple rows, to delete multiple rows from MySQL table.

How do you delete random rows in SQL?

First, you specify the name of the table from which the rows are to be deleted in the FROM clause. Second, to specify the number or percent of random rows that will be deleted, you use the TOP clause.

How do I delete rows faster in MySQL?

To delete rows in a MySQL table, use the DELETE FROM statement: DELETE FROM products WHERE product_id=1; The WHERE clause is optional, but you'll usually want it, unless you really want to delete every row from the table.


2 Answers

In theory, this will be random and fast. In practise, it will be only fast:

DELETE FROM tableX
LIMIT 4000

This will be random but terribly slow, with 600K rows:

DELETE FROM tableX
ORDER BY RAND() 
LIMIT 4000

This won't be truly random (as there are usually gaps in the ids) and it may not even delete exactly 4000 rows (but a few less when there are many gaps) but it's probably faster than the previous.

An extra wrapping in a subquery is needed because the syntax for Delete from multiple tables does not allow LIMIT:

DELETE td
FROM
      tableX AS td
  JOIN 
      ( SELECT t.id
        FROM
             tableX AS t 
          CROSS JOIN
              ( SELECT MAX(id) AS maxid 
                FROM tableX
              ) AS m
          JOIN
              ( SELECT RAND() AS rndm
                FROM tableX AS tr
                LIMIT 5000
              ) AS r
            ON 
               t.id = CEIL( rndm * maxid )
        LIMIT 4000
      ) AS x
      ON 
         x.id = td.id 

Explain output (of the subquery, from a 400K rows table):

id           table        possible_keys  key_len     rows 
 select_type         type             key     ref         Extra
1 PRIMARY <derived2> system                             1
1 PRIMARY <derived3> ALL                             5000
1 PRIMARY t          eq_ref PRIMARY PRIMARY 4 func      1 Using where;Using index
3 DERIVED tr         index          PRIMARY 4      398681 Using index
2 DERIVED                                            Select tables optimized away
like image 94
ypercubeᵀᴹ Avatar answered Sep 28 '22 11:09

ypercubeᵀᴹ


delete from yourTable limit 4000
like image 28
juergen d Avatar answered Sep 28 '22 13:09

juergen d