I currently have a MySQL table of about 20 million rows, and I need to prune it. I'd like to remove every row whose updateTime
(timestamp of insertion) was more than one month ago. I have not personally performed any alterations of the table's order, so the data should be in the order in which it was inserted, and there is a UNIQUE
key on two fields, id
and updateTime
. How would I go about doing this in a short amount of time?
How much down time can you incur? How big are the rows? How many are you deleting?
Simply put, deleting rows is one of the most expensive things you can do to a table. It's just a horrible thing overall.
If you don't have to do it, and you have the disk space for it, and your queries aren't affected by the table size (well indexed queries typically ignore table size), then you may just leave well enough alone.
If you have the opportunity and can take the table offline (and you're removing a good percentage of the table), then your best bet would be to copy the rows you want to keep to a new table, drop the old one, rename the new one to the old name, and THEN recreate your indexes.
Otherwise, you're pretty much stuck with good 'ol delete.
There are two ways to remove a large number of rows. First there is the obvious way:
DELETE FROM table1 WHERE updateTime < NOW() - interval 1 month;
The second (slightly more complicated) way is to create a new table and copy the data that you want to keep, truncate your old table, then copy the rows back.
CREATE TABLE table2 AS
SELECT * FROM table1 WHERE updateTime >= NOW() - interval 1 month;
TRUNCATE table1;
INSERT INTO table1
SELECT * FROM table2;
Using TRUNCATE
is much faster than a DELETE
with a WHERE
clause when you have a large number of rows to delete and a relatively small number that you wish to keep.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With