I need to load a large data set onto a production database.
15 files need to each be uploaded and inserted into a table. Each is about 500 Mb.
I have two ID columns that need to be indexed. If I load the files with indexes in place, the upload takes around 3 hours. If I drop indexes, load data local infile, then re-add the indexes, the whole operation takes about 30 minutes.
The problem is, database responsiveness takes a big hit while indexing the freshly imported data. Is there a way to make the indexing run at a "low priority" so that other queries still get 95-100% speed and the indexing kind of chugs along in the background?
I'm using Amazon RDS, so I don't have the option of just loading on a different server then copying over the table files.
Adding a bounty to this as I still want to see if there is a way to get good performance while indexing on a specific box.
Yes, indexes can hurt performance for SELECTs. It is important to understand how database engines operate. Data is stored on disk(s) in "pages". Indexes make it possible to access the specific page that has a specific value in one or more columns in the table.
Indexes will degrade insert/delete performance since indexes have to be updated. In case of update it depends on whether you update indexed columns. If not, performance should not be affected. Indexes can also speed up a DELETE and UPDATE statements if the WHERE condition can make use of the index.
Indexing makes columns faster to query by creating pointers to where data is stored within a database. Imagine you want to find a piece of information that is within a large database. To get this information out of the database the computer will look through every row until it finds it.
The Drawbacks of Using IndexesIndexes consume disk space – an index occupies its own space, so indexed data will consume more disk space too; Redundant and duplicate indexes can be a problem – MySQL allows you to create duplicate indexes on a column and it does not “protect you” from doing such a mistake.
Well, I never found a way to throttle, but I did figure out a way to alleviate my problem. The solution was unique to my problem, but I'll post it in case someone else finds it useful.
I wrote a class named CautiousIndexer
.
prevent_indexing_($name)
. This was still fine in terms of efficiency, but during the indexing on the master server write performance was unacceptably slowed. Still looking for a way to index with throttling.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With