i have some delete queries to run against some pretty huge table (~100 GB), and i want to optimize them as much as possible:
delete from table1 where column1 < date_sub(now(), interval 100 hour);
column1 is a datetime
column, i assume making an index for this column will speed up the deletions. besides that, anything i can do here? will using the date_sub()
function slow down the query? should i calculate that value before running the query?
delete from table2 where column2 = x;
column2 is the primary key for table2, so it's already an index according to the mysql documentation. my question is: the index kind is PRIMARY
, is that same as the INDEX
? do i have to make another index of the kind INDEX
for speeding up?
delete from table3 where column3 = y;
table3 has a compound primary key, which is column3 and column4. so i have an primary key index, but since the delete query doesn't use column4, should i make a separate index just for column3? or the combined primary key would do it?
i guess these are pretty basic questions, but i couldn't find a definite answer specific to my situation, so any help would be appreciated!
1) Run your delete statements with a LIMIT clause (Eg 1000) in a loop until the table is empty, if your deleting a lot. This causes the delete statement transaction to do less work per run.
If your DELETE
is intended to eliminate a great majority of the rows in that table, one thing that people often do is copy just the rows you want to keep to a duplicate table, and then use DROP TABLE
or TRUNCATE
to wipe out the original table much more quickly.
An index may help to find the rows you need to delete, but deleting requires updating the index. After deleting a lot of rows, the index may be imbalanced and requires some maintenance with OPTIMIZE TABLE
.
The DATE_SUB()
function is a constant expression (it does not vary row by row) so the query optimizer should be smart enough to factor it out and perform the calculation once.
You don't need to create an extra index for a primary key. The primary key constraint implicitly creates an index that gives the same benefit as a non-primary key index.
A compound index is probably just as useful as a single-column index, provided your search criteria reference the leftmost column(s) of the index. The "probably" caveat is due to the individual index nodes being larger and so it takes more memory to cache the index, but this is a small enough factor that I wouldn't create a whole other single-column index.
i assume making an index for this column will speed up the deletions.
Incorrect, because that same index needs to be updated for the index to be of any value for future use.
will using the date_sub() function slow down the query?
No, it's fine because it is not based on a column value. Functions performed on column values ensure that an index, if one existed on the column, can not be used.
the index kind is "PRIMARY", is that same as the "INDEX"?
It is, and the primary portion ensures that the values in that index are also unique.
do i have to make another index of the kind "INDEX" for speeding up?
No, you don't. MySQL also limits the total size of indexes that can be defined on a single table, depending on the type. 767 bytes is the stated index prefix limitation for InnoDB tables; it's 1,000 bytes for MyISAM tables.
table3 has a compound primary key, which is column3 and column4. so i have an primary key index, but since the delete query doesn't use column4, should i make a separate index just for column3? or the combined primary key would do it?
Test both setups & decide. I don't think the additional index is necessary myself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With