Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL query: Delete all records from the table except latest N?

Is it possible to build a single mysql query (without variables) to remove all records from the table, except latest N (sorted by id desc)?

Something like this, only it doesn't work :)

delete from table order by id ASC limit ((select count(*) from table ) - N) 

Thanks.

like image 692
serg Avatar asked Feb 23 '09 18:02

serg


People also ask

How do you select last n records from a table?

The following is the syntax to get the last 10 records from the table. Here, we have used LIMIT clause. SELECT * FROM ( SELECT * FROM yourTableName ORDER BY id DESC LIMIT 10 )Var1 ORDER BY id ASC; Let us now implement the above query.

How do I select all rows except one in SQL?

The SQL EXCEPT operator is used to return all rows in the first SELECT statement that are not returned by the second SELECT statement. Each SELECT statement will define a dataset. The EXCEPT operator will retrieve all records from the first dataset and then remove from the results all records from the second dataset.

How do I select the last 5 records of a table?

1 Answer. ORDER BY id ASC; In the above query, we used subquery with the TOP clause that returns the table with the last 5 records sorted by ID in descending order. Again, we used to order by clause to sort the result-set of the subquery in ascending order by the ID column.


2 Answers

You cannot delete the records that way, the main issue being that you cannot use a subquery to specify the value of a LIMIT clause.

This works (tested in MySQL 5.0.67):

DELETE FROM `table` WHERE id NOT IN (   SELECT id   FROM (     SELECT id     FROM `table`     ORDER BY id DESC     LIMIT 42 -- keep this many records   ) foo ); 

The intermediate subquery is required. Without it we'd run into two errors:

  1. SQL Error (1093): You can't specify target table 'table' for update in FROM clause - MySQL doesn't allow you to refer to the table you are deleting from within a direct subquery.
  2. SQL Error (1235): This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery' - You can't use the LIMIT clause within a direct subquery of a NOT IN operator.

Fortunately, using an intermediate subquery allows us to bypass both of these limitations.


Nicole has pointed out this query can be optimised significantly for certain use cases (such as this one). I recommend reading that answer as well to see if it fits yours.

like image 67
Alex Barrett Avatar answered Sep 20 '22 09:09

Alex Barrett


I know I'm resurrecting quite an old question, but I recently ran into this issue, but needed something that scales to large numbers well. There wasn't any existing performance data, and since this question has had quite a bit of attention, I thought I'd post what I found.

The solutions that actually worked were the Alex Barrett's double sub-query/NOT IN method (similar to Bill Karwin's), and Quassnoi's LEFT JOIN method.

Unfortunately both of the above methods create very large intermediate temporary tables and performance degrades quickly as the number of records not being deleted gets large.

What I settled on utilizes Alex Barrett's double sub-query (thanks!) but uses <= instead of NOT IN:

DELETE FROM `test_sandbox`   WHERE id <= (     SELECT id     FROM (       SELECT id       FROM `test_sandbox`       ORDER BY id DESC       LIMIT 1 OFFSET 42 -- keep this many records     ) foo   ); 

It uses OFFSET to get the id of the Nth record and deletes that record and all previous records.

Since ordering is already an assumption of this problem (ORDER BY id DESC), <= is a perfect fit.

It is much faster, since the temporary table generated by the subquery contains just one record instead of N records.

Test case

I tested the three working methods and the new method above in two test cases.

Both test cases use 10000 existing rows, while the first test keeps 9000 (deletes the oldest 1000) and the second test keeps 50 (deletes the oldest 9950).

+-----------+------------------------+----------------------+ |           | 10000 TOTAL, KEEP 9000 | 10000 TOTAL, KEEP 50 | +-----------+------------------------+----------------------+ | NOT IN    |         3.2542 seconds |       0.1629 seconds | | NOT IN v2 |         4.5863 seconds |       0.1650 seconds | | <=,OFFSET |         0.0204 seconds |       0.1076 seconds | +-----------+------------------------+----------------------+ 

What's interesting is that the <= method sees better performance across the board, but actually gets better the more you keep, instead of worse.

like image 45
Nicole Avatar answered Sep 22 '22 09:09

Nicole