Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deleting from table with millions of records

Tags:

sql

mysql

I'm trying to find a way to do a conditional DELETE on an InnoDB table which contains millions of records, without locking it (thus not bringing the website down).

I've tried to find information on mysql.com, but to no avail. Any tips on how to proceed?

like image 937
changelog Avatar asked Feb 13 '10 14:02

changelog


3 Answers

I use procedure to delete

create procedure delete_last_year_data() 
begin
  DECLARE del_row varchar(255);
  DECLARE done INT DEFAULT 0;

  declare del_rows cursor for select CONCAT('DELETE FROM table_name WHERE id = ', id)
                            from table_name 
                            where created_time < '2018-01-01 00:00:00';
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;

  open del_rows;

  repeat
    fetch del_rows into del_row;
    if not done
    then
      set @del = del_row;
      prepare stmt from @del;
      execute stmt;
      DEALLOCATE PREPARE stmt;
    end if;
  until done end repeat;

  close del_rows;

end //

like image 160
Gavin Zhao Avatar answered Oct 24 '22 16:10

Gavin Zhao


I don't think it is possible to delete without locking. That said, I don't think locking the record you want to delete is a problem. What would be a problem is locking other rows.

I found some information on that subject here: http://dev.mysql.com/doc/refman/5.0/en/innodb-locks-set.html

What I would suggest, is to try and do a million single row deletes. I think that if you do all those in a single transaction, performance should not hurt too much. so you would get something like:

START TRANSACTION;

DELETE FROM tab WHERE id = 1;
..
..
DELETE FROM tab WHERE id = x;

COMMIT;

You can generate the required statments by doing something like

SELECT CONCAT('DELETE FROM tab WHERE id = ', id)
FROM   tab
WHERE  <some intricate condition that selects the set you want to delete>

So the advantage over this method instead of doing:

DELETE FROM tab 
WHERE  <some intricate condition that selects the set you want to delete>

is that in the first approach you only ever lock the record you're deleting, whereas in the second approach you could run the risk of locking other records that happen to be in the same range as the rows you are deleteing.

like image 31
Roland Bouman Avatar answered Oct 24 '22 18:10

Roland Bouman


If it fits your application, then you could limit the number of rows to delete, and setup a cronjob for repeating the deletion. E.g.:

DELETE FROM tab WHERE .. LIMIT 1000

I found this to be good compromise in a similar scenario.

like image 22
Frunsi Avatar answered Oct 24 '22 17:10

Frunsi