Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

is there a way to throttle the indexing of Mysql tables so overall performance is not impacted?

I need to load a large data set onto a production database.

15 files need to each be uploaded and inserted into a table. Each is about 500 Mb.

I have two ID columns that need to be indexed. If I load the files with indexes in place, the upload takes around 3 hours. If I drop indexes, load data local infile, then re-add the indexes, the whole operation takes about 30 minutes.

The problem is, database responsiveness takes a big hit while indexing the freshly imported data. Is there a way to make the indexing run at a "low priority" so that other queries still get 95-100% speed and the indexing kind of chugs along in the background?

I'm using Amazon RDS, so I don't have the option of just loading on a different server then copying over the table files.

Adding a bounty to this as I still want to see if there is a way to get good performance while indexing on a specific box.

like image 375
Zak Avatar asked May 10 '11 00:05

Zak


People also ask

Does indexing reduce performance?

Yes, indexes can hurt performance for SELECTs. It is important to understand how database engines operate. Data is stored on disk(s) in "pages". Indexes make it possible to access the specific page that has a specific value in one or more columns in the table.

Does index affect performance?

Indexes will degrade insert/delete performance since indexes have to be updated. In case of update it depends on whether you update indexed columns. If not, performance should not be affected. Indexes can also speed up a DELETE and UPDATE statements if the WHERE condition can make use of the index.

How query performance can be improved using indexing?

Indexing makes columns faster to query by creating pointers to where data is stored within a database. Imagine you want to find a piece of information that is within a large database. To get this information out of the database the computer will look through every row until it finds it.

What are the disadvantages of indexes in MySQL?

The Drawbacks of Using IndexesIndexes consume disk space – an index occupies its own space, so indexed data will consume more disk space too; Redundant and duplicate indexes can be a problem – MySQL allows you to create duplicate indexes on a column and it does not “protect you” from doing such a mistake.


1 Answers

Well, I never found a way to throttle, but I did figure out a way to alleviate my problem. The solution was unique to my problem, but I'll post it in case someone else finds it useful.

I wrote a class named CautiousIndexer.

  1. First I stored the create table statement to recreate the table structure without indexes. I stored an array of read slave databases, looped through them renaming the table with the unindexed data to prevent_indexing_($name).
  2. Then I ran the create table statement on the slaves only. This effectively moved the data out of the way of indexing statements that would happen on the master.
  3. Then I ran the index query against the master. Read slaves had no performance impact while the master was indexing because the newly created tables were empty.
  4. When the master finished indexing, I took 1 of the slaves out of production rotation, dropped the empty table, moved the full table back in place, then indexed the table on the out of production slave.
  5. When that finished I put it back in production and repeated the slave indexing procedure on the remaining slaves.
  6. When all slaves were indexed, I put the table into production.

This was still fine in terms of efficiency, but during the indexing on the master server write performance was unacceptably slowed. Still looking for a way to index with throttling.

like image 67
Zak Avatar answered Sep 28 '22 08:09

Zak