Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete huge amounts of data from huge table

Tags:

mysql

I have two tables. Let's call them KEY and VALUE.
KEY is small, somewhere around 1.000.000 records.
VALUE is huge, say 1.000.000.000 records.

Between them there is a connection such that each KEY might have many VALUES. It's not a foreign key but basically the same meaning.

The DDL looks like this

create table KEY (
 key_id int,
 primary key (key_id)
);

create table VALUE (
 key_id int,
 value_id int,
 primary key (key_id, value_id)
);

Now, my problem. About half of all key_ids in VALUE have been deleted from KEY and I need to delete them in a orderly fashion while both tables are still under high load.

It would be easy to do

delete v 
  from VALUE v
  left join KEY k using (key_id)
 where k.key_id is null;

However, as it's not allowed to have a limit on multi table delete I don't like this approach. Such a delete would take hours to run and that makes it impossible to throttle the deletes.

Another approach is to create cursor to find all missing key_ids and delete them one by one with a limit. That seems very slow and kind of backwards.

Are there any other options? Some nice tricks that could help?

like image 355
Andreas Wederbrand Avatar asked Oct 18 '13 11:10

Andreas Wederbrand


People also ask

What is the best method to delete a table having huge data say 100k records?

If you want to delete the records of a table with a large number of records but keep some of the records, You can save the required records in a similar table and truncate the main table and then return the saved records to the main table.

How do you delete 10000 records in SQL?

If you need to remove 10 million rows and have 1 GB of log space available use Delete TOP(10000) From dbo. myTable (with your select clause) and keep running it till there are no more rows to delete.


2 Answers

Any solution that tries to delete so much data in one transaction is going to overwhelm the rollback segment and cause a lot of performance problems.

A good tool to help is pt-archiver. It performs incremental operations on moderate-sized batches of rows, as efficiently as possible. pt-archiver can copy, move, or delete rows depending on options.

The documentation includes an example of deleting orphaned rows, which is exactly your scenario:

pt-archiver --source h=host,D=db,t=VALUE --purge \
  --where 'NOT EXISTS(SELECT * FROM `KEY` WHERE key_id=`VALUE`.key_id)' \
  --limit 1000 --commit-each

Executing this will take significantly longer to delete the data, but it won't use too many resources, and without interrupting service on your existing database. I have used it successfully to purge hundreds of millions of rows of outdated data.

pt-archiver is part of the Percona Toolkit for MySQL, a free (GPL) set of scripts that help common tasks with MySQL and compatible databases.

like image 96
Bill Karwin Avatar answered Oct 15 '22 06:10

Bill Karwin


Directly from MySQL documentation

If you are deleting many rows from a large table, you may exceed the lock table size for an InnoDB table. To avoid this problem, or simply to minimize the time that the table remains locked, the following strategy (which does not use DELETE at all) might be helpful:

Select the rows not to be deleted into an empty table that has the same structure as the original table:

INSERT INTO t_copy SELECT * FROM t WHERE ... ;

Use RENAME TABLE to atomically move the original table out of the way and rename the copy to the original name:

RENAME TABLE t TO t_old, t_copy TO t;

Drop the original table:

DROP TABLE t_old;

No other sessions can access the tables involved while RENAME TABLE executes, so the rename operation is not subject to concurrency problems. See Section 12.1.9, “RENAME TABLE Syntax”.

So in Your case You may do

INSERT INTO value_copy SELECT * FROM VALUE WHERE key_id IN
    (SELECT key_id FROM `KEY`);

RENAME TABLE value TO value_old, value_copy TO value;

DROP TABLE value_old;

And according to what they wrote here RENAME operation is quick and number of records doesn't affect it.

like image 35
Gustek Avatar answered Oct 15 '22 04:10

Gustek