Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

most efficient way to add index to large mysql table

I have a mysql database with a particular table with a little over 6 million rows and no indexes. A simple query such as SELECT ... FROM log ORDER BY log_date ASC will take an unacceptable amount of time. I obviously need to add indexes to the table, but am unsure of the most efficient way to go about this.

1) My first option would be to issue ALTER TABLE log ADD INDEX log_date, but I'm not sure how long it would take... would it take approximately the same length of time as the previous query? If so, this is unacceptable.

2) My second option would be to export the table, TRUNCATE the table, issue the ALTER TABLE statement, and then re-import the table data. I'm not sure how long it would take to re-import the data, and am concerned as to what would happen if the system tries to write rows to the table during the process.

Is anyone able to offer insight into the best way to index a moderately large table in a production system without causing too much grief?

like image 226
Timothy Avatar asked Dec 29 '22 07:12

Timothy


2 Answers

Our experimentation has indicated that the best answer in terms of performance is to 1-create a new empty table and add the index to that, then copy the original table into it. Then rename the old table to a new name, give the new table the correct name, and finally, drop the old table.

like image 112
ebeland Avatar answered Dec 30 '22 19:12

ebeland


As far as I know, option 2 will actually cause you more hassle than option 1, not just because it involves more shuffling about of data but because importing the data to the table and building the index at the same time is going to take longer than just importing the data and then building the entire index.

Think about it: if MySQL is importing the table data and building the index at the same time, then at every row inserted it not only has to insert that row but also insert a row into the index. That's going to be slower than just building the index outright from already-present data. On the other hand, if you restore the data in its entirety and then build the index, you're back to option 1 but you've pointlessly emptied and refilled the table in the meantime.

I don't think you're going to get better performance than you can get just by biting the bullet and issuing the ALTER TABLE command. Perhaps you can schedule the command to be run at a time when usage is lower, like the middle of the night??

like image 45
Hammerite Avatar answered Dec 30 '22 21:12

Hammerite