I have a stored procedure that is working with a large amount of data. I have that data being inserted in to a temp table. The overall flow of events is something like
CREATE #TempTable ( Col1 NUMERIC(18,0) NOT NULL, --This will not be an identity column. ,Col2 INT NOT NULL, ,Col3 BIGINT, ,Col4 VARCHAR(25) NOT NULL, --Etc... -- --Create primary key here? ) INSERT INTO #TempTable SELECT ... FROM MyTable WHERE ... INSERT INTO #TempTable SELECT ... FROM MyTable2 WHERE ... -- -- ...or create primary key here?
My question is when is the best time to create a primary key on my #TempTable table? I theorized that I should create the primary key constraint/index after I insert all the data because the index needs to be reorganized as the primary key info is being created. But I realized that my underlining assumption might be wrong...
In case it is relevant, the data types I used are real. In the #TempTable
table, Col1
and Col4
will be making up my primary key.
Update: In my case, I'm duplicating the primary key of the source tables. I know that the fields that will make up my primary key will always be unique. I have no concern about a failed alter table if I add the primary key at the end.
Though, this aside, my question still stands as which is faster assuming both would succeed?
In the #TempTable table, Col1 and Col4 will be making up my primary key. Update: In my case, I'm duplicating the primary key of the source tables. I know that the fields that will make up my primary key will always be unique.
Every table can have (but does not have to have) a primary key. The column or columns defined as the primary key ensure uniqueness in the table; no two rows can have the same key. The primary key of one table may also help to identify records in other tables, and be part of the second table's primary key.
SQL temp tables are created using CREATE TABLE T-SQL statement, but table variables are created using DECLARE @name Table T-SQL statement. You can ALTER the SQL Server temp tables after creating it, but table variables don't support any DDL statement like ALTER statement.
This depends a lot.
If you make the primary key index clustered after the load, the entire table will be re-written as the clustered index isn't really an index, it is the logical order of the data. Your execution plan on the inserts is going to depend on the indexes in place when the plan is determined, and if the clustered index is in place, it will sort prior to the insert. You will typically see this in the execution plan.
If you make the primary key a simple constraint, it will be a regular (non-clustered) index and the table will simply be populated in whatever order the optimizer determines and the index updated.
I think the overall quickest performance (of this process to load temp table) is usually to write the data as a heap and then apply the (non-clustered) index.
However, as others have noted, the creation of the index could fail. Also, the temp table does not exist in isolation. Presumably there is a best index for reading the data from it for the next step. This index will need to either be in place or created. This is where you have to make a tradeoff of speed here for reliability (apply the PK and any other constraints first) and speed later (have at least the clustered index in place if you are going to have one).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With