Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does SQL Server work faster when you index a table after filling it?

I have a sproc that puts 750K records into a temp table through a query as one of its first actions. If I create indexes on the temp table before filling it, the item takes about twice as long to run compared to when I index after filling the table. (The index is an integer in a single column, the table being indexed is just two columns each a single integer.)

This seems a little off to me, but then I don't have the firmest understanding of what goes on under the hood. Does anyone have an answer for this?

like image 905
Dan Blair Avatar asked Aug 26 '08 19:08

Dan Blair


People also ask

Why is indexing faster in SQL?

Indexing makes columns faster to query by creating pointers to where data is stored within a database. Imagine you want to find a piece of information that is within a large database. To get this information out of the database the computer will look through every row until it finds it.

Is it better to create an index before filling a table with data or after the data is in place?

You should create an index for a table after inserting or loading data (via SQL*Loader or Import) into the table. It is more efficient to insert rows of data into a table that has no indexes and then create the indexes for subsequent access.

Do indexes speed up inserts?

If you update a table, the system has to maintain those indexes that are on the columns being updated. So having a lot of indexes can speed up select statements, but slow down inserts, updates, and deletes.

Should you create index before or after insert?

Creating index after data insert is more efficient way (it even often recomended to drop index before batch import and after import recreate it).


2 Answers

If you create a clustered index, it affects the way the data is physically ordered on the disk. It's better to add the index after the fact and let the database engine reorder the rows when it knows how the data is distributed.

For example, let's say you needed to build a brick wall with numbered bricks so that those with the highest number are at the bottom of the wall. It would be a difficult task if you were just handed the bricks in random order, one at a time - you wouldn't know which bricks were going to turn out to be the highest numbered, and you'd have to tear the wall down and rebuild it over and over. It would be a lot easier to handle that task if you had all the bricks lined up in front of you, and could organize your work.

That's how it is for the database engine - if you let it know about the whole job, it can be much more efficient than if you just feed it a row at a time.

like image 108
Jon Galloway Avatar answered Oct 18 '22 06:10

Jon Galloway


It's because the database server has to do calculations each and every time you insert a new row. Basically, you end up reindexing the table each time. It doesn't seem like a very expensive operation, and it's not, but when you do that many of them together, you start to see the impact. That's why you usually want to index after you've populated your rows, since it will just be a one-time cost.

like image 33
tghw Avatar answered Oct 18 '22 06:10

tghw