In a comment I read
Just as a side note, it's sometimes faster to drop the indices of your table and recreate them after the bulk insert operation.
Is this true? Under which circumstances?
Removing indexes prior to large inserts on a table, including when using SQL Bulk Insert, may be a best practice to increase performance.
You should create an index for a table after inserting or loading data (via SQL*Loader or Import) into the table. It is more efficient to insert rows of data into a table that has no indexes and then create the indexes for subsequent access.
The number of indexes on a table is the most dominant factor for insert performance. The more indexes a table has, the slower the execution becomes. The insert statement is the only operation that cannot directly benefit from indexing because it has no where clause.
So having a lot of indexes can speed up select statements, but slow down inserts, updates, and deletes. Note: Updates and deletes with WHERE clauses can use indexes for scans, even if the indexed column is being updated.
As with Joel I will echo the statement that yes it can be true. I've found that the key to identifying the scenario that he mentioned is all in the distribution of data, and the size of the index(es) that you have on the specific table.
In an application that I used to support that did a regular bulk import of 1.8 million rows, with 4 indexes on the table, 1 with 11 columns, and a total of 90 columns in the table. The import with indexes took over 20 hours to complete. Dropping the indexes, inserting, and re-creating the indexes only took 1 hour and 25 minutes.
So it can be a big help, but a lot of it comes down to your data, the indexes, and the distribution of data values.
Yes, it is true. When there are indexes on the table during an insert, the server will need to be constantly re-ordering/paging the table to keep the indexes up to date. If you drop the indexes, it can just add the rows without worrying about that, and then build the indexes all at once when you re-create them.
The exception, of course, is when the import data is already in index order. In fact, I should note that I'm working on a project right now where this opposite effect was observed. We wanted to reduce the run-time of a large import (nightly dump from a mainframe system). We tried removing the indexes, importing the data, and re-creating them. It actually significantly increased the time for the import to complete. But, this is not typical. It just goes to show that you should always test first for your particular system.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With