I have a MYSQL table that only requires data upto 30 days prior to todays date. It has data that can be upto a few years after todays date. For faster querying, I usually delete the old records since I see no point of searching through the old records. However, I still maintain a backup copy of the records if we ever needed it for analytics. The original table is this:
CREATE TABLE featured_deal (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
fsa VARCHAR(10),
poster_id int(11),
dealid bigint(20),
bookedDate date,
createDate timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
UNIQUE KEY `featured_unique`(fsa, bookedDate)
)
And I create a table that is a replica of this table called history:
CREATE TABLE featured_deal_history (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
fsa VARCHAR(10),
poster_id int(11),
dealid bigint(20),
bookedDate date,
createDate timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
UNIQUE KEY `featured_unique`(fsa, bookedDate)
)
I have a trigger to populate the history table whenever an insert happens on the original table:
CREATE TRIGGER `featured_to_history`
AFTER INSERT ON lst_enmasse_featured_deal
FOR EACH ROW
INSERT INTO lst_enmasse_featured_deal_history (fsa,poster_id,dealid,bookedDate,createDate)
VALUES (NEW.fsa,NEW.poster_id,NEW.dealid,NEW.bookedDate,NEW.createDate)
Finally, I clean the table using a cron job and the command:
DELETE * FROM featured_deal WHERE bookedDate < DATE_SUB(CURDATE(), INTERVAL 30 DAY)
Is there a better way to perform the above task? I thought about MYSQL Partitions. However, I don't have a fixed partition. The date changes and hence I'd need two new partitions everyday.
In principle your aproach is OK, however the concept is predicated on the idea that a smaller table is more performant. This imlpies that your queries are running full table scans against the data - i.e. you've not configured your indexes correctly.
I suggest that the first thing you fix is the performance of your queries.
If you still need to keep stuff out of the hot data table, then you should seek to do any inserts into the history table as bulk operation NOT a row at a time - this will keep the table and indexes in a healthy state. This could be done in a batch operation as suggested by Cristian, or you could use a stochastic method (with a state variable in the source table). e.g. something like...
AFTER INSERT ON mytable
IF (0.95<RAND()) THEN
UPDATE mytable SET control=CONNECTION_ID()
WHERE control IS NULL;
INSERT INTO backuptable (...)
SELECT .... FROM mytable
WHERE control=CONNECTION_ID();
UPDATE mytable SET control=-1
WHERE control=CONNECTION_ID();
END IF;
One further consideration is that you generate a new id when you write to the history table: why?
I would make it simpler. Create a daily cron which executes these two queryes with "TODAY_DATE":
create table if not exists featured_deal_new like featured_deal
rename table featured_deal to featured_deal_history_TODAY_DATE, featured_deal_new to featured_deal
What will happen: (renaming tables is very fast). You will have a history table for each day.
Feel free to combine the history tables afterwards
insert into featured_deal_history... select * from featured_deal_history_TODAY_DATE
Drop table featured_deal_history_TODAY_DATE
This way you do not loose performance on the inserts in the main table.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With