Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deleting old records from a mysql table but keeping a backup

I have a MYSQL table that only requires data upto 30 days prior to todays date. It has data that can be upto a few years after todays date. For faster querying, I usually delete the old records since I see no point of searching through the old records. However, I still maintain a backup copy of the records if we ever needed it for analytics. The original table is this:

CREATE TABLE featured_deal (
     id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
      fsa VARCHAR(10),
      poster_id int(11),
      dealid bigint(20),
      bookedDate date,
      createDate timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
    UNIQUE KEY `featured_unique`(fsa, bookedDate)
    )

And I create a table that is a replica of this table called history:

CREATE TABLE featured_deal_history (
             id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
              fsa VARCHAR(10),
              poster_id int(11),
              dealid bigint(20),
              bookedDate date,
              createDate timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
            UNIQUE KEY `featured_unique`(fsa, bookedDate)
            )

I have a trigger to populate the history table whenever an insert happens on the original table:

CREATE TRIGGER `featured_to_history`
            AFTER INSERT ON lst_enmasse_featured_deal 
            FOR EACH ROW
            INSERT INTO lst_enmasse_featured_deal_history (fsa,poster_id,dealid,bookedDate,createDate)
            VALUES (NEW.fsa,NEW.poster_id,NEW.dealid,NEW.bookedDate,NEW.createDate)

Finally, I clean the table using a cron job and the command:

    DELETE * FROM featured_deal WHERE bookedDate < DATE_SUB(CURDATE(), INTERVAL 30 DAY)

Is there a better way to perform the above task? I thought about MYSQL Partitions. However, I don't have a fixed partition. The date changes and hence I'd need two new partitions everyday.

like image 303
Anshuman Biswas Avatar asked Nov 02 '22 20:11

Anshuman Biswas


2 Answers

In principle your aproach is OK, however the concept is predicated on the idea that a smaller table is more performant. This imlpies that your queries are running full table scans against the data - i.e. you've not configured your indexes correctly.

I suggest that the first thing you fix is the performance of your queries.

If you still need to keep stuff out of the hot data table, then you should seek to do any inserts into the history table as bulk operation NOT a row at a time - this will keep the table and indexes in a healthy state. This could be done in a batch operation as suggested by Cristian, or you could use a stochastic method (with a state variable in the source table). e.g. something like...

AFTER INSERT ON mytable
IF (0.95<RAND()) THEN
   UPDATE mytable SET control=CONNECTION_ID()
   WHERE control IS NULL;
   INSERT INTO backuptable (...)
     SELECT .... FROM mytable
     WHERE control=CONNECTION_ID();
   UPDATE mytable SET control=-1
   WHERE control=CONNECTION_ID();
 END IF;

One further consideration is that you generate a new id when you write to the history table: why?

like image 73
symcbean Avatar answered Nov 15 '22 05:11

symcbean


I would make it simpler. Create a daily cron which executes these two queryes with "TODAY_DATE":

create table if not exists featured_deal_new like featured_deal
rename table featured_deal to featured_deal_history_TODAY_DATE, featured_deal_new to featured_deal

What will happen: (renaming tables is very fast). You will have a history table for each day.

Feel free to combine the history tables afterwards

insert into featured_deal_history... select * from featured_deal_history_TODAY_DATE

Drop table featured_deal_history_TODAY_DATE

This way you do not loose performance on the inserts in the main table.

like image 20
Jinxmcg Avatar answered Nov 15 '22 05:11

Jinxmcg