Summary: I have a table populated via the following:
insert into the_table (...) select ... from some_other_table
Running the above query with no primary key on the_table is ~15x faster than running it with a primary key, and I don't understand why.
The details: I think this is best explained through code examples.
I have a table:
create table the_table (
a int not null,
b smallint not null,
c tinyint not null
);
If I add a primary key, this insert query is terribly slow:
alter table the_table
add constraint PK_the_table primary key(a, b);
-- Inserting ~880,000 rows
insert into the_table (a,b,c)
select a,b,c from some_view;
Without the primary key, the same insert query is about 15x faster. However, after populating the_table without a primary key, I can add the primary key constraint and that only takes a few seconds. This one really makes no sense to me.
More info:
Any ideas?
Actually its not as clear cut as Ryk suggests.
It can actually be faster to add data to a table with an index then in a heap.
Read this arctle - and as far as i am aware its quite well regarded:
http://www.sqlskills.com/blogs/kimberly/post/The-Clustered-Index-Debate-Continues.aspx
Bear in mind its written by SQL Server MVP and a Microsoft Regional Director.
Inserts are faster in a clustered table (but only in the "right" clustered table) than compared to a heap. The primary problem here is that lookups in the IAM/PFS to determine the insert location in a heap are slower than in a clustered table (where insert location is known, defined by the clustered key). Inserts are faster when inserted into a table where order is defined (CL) and where that order is ever-increasing. I have some simple numbers but I'm thinking about creating a much larger/complex scenario and publishing those. Simple/quick tests on a laptop are not always as "exciting".
I think if you create a simple primary key that is clustered and made up of a single auto-incrementing column, then inserts into such a table might be faster. Most likely, a primary key made up of multiple columns may be the cause of slowdown in inserts. When you use a composite key for primary key, then rows inserted may not get added to the end of table but may need to be added somewhere in the middle of existing physical order of rows in table, which adds to the insert time and hence makes the INSERTS slower. So use a single auto-incrementing column as the the primary key value in your case to speed up inserts.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With