Bulk Insert into a HEAP vs CLUSTERED index where minimal logging is not an option (SQL Server 2008)

Tags:

The tool currently used is Informatica and we have bookend stored procedures that drop the clustered indexes and then add them back to the database. In the stored procedure where we add the clustered indexes back we have the DDL for the indexes hard coded into the stored procedure (we don't use sys tables because a fear of Microsoft changing the sys tables and regen from there creates a bad index or fails). This causes issues where people have created the clustered indexes but not thought to update the stored procedure and the next time bulk occurs these indexes are gone. We previously did this for all indexes but switched non clustered indexes to using disable/rebuild. This not an option though because we will no longer be able to insert into the table if this is done to the clustered index because it is essentially the table.

Performance is important but not everything. Good performance and easy maintainability trumps great performance and complex maintainability.

After reading many sites it is almost universally agreed that when performing bulk insert, on data not ordered the same as your primary key, inserting into a heap and then applying the pk afterwards is faster ( http://msdn.microsoft.com/en-us/library/ms177445.aspx , http://msdn.microsoft.com/en-us/library/dd425070(v=sql.100).aspx). Most these sites make assumptions that I cannot use at my organization and with my toolset.

Currently due to our current standards policies we have to use FULL recovery model so minimally logging will not occur no matter which selection that I make in reference to heap vs clustered index.

According to our informatica admins specifying tablock or order hints on bcp is not possible through the UI and our organization is adverse to customization beyond the UI because of maintainability.

So the question after all of this is with all the factors above would you recommend that we continue with our somewhat unreliable stored procedures, insert into a clustered index or have some third far superior solution. I also realize there is other stack questions similar to this item but they do not address bulk specifically and/or make similar assumptions in their answers.

614

asked Aug 25 '11 01:08

JStead

1 Answers

My suggestion would be to bulk load into a staging table (a heap, or CI matching the file order), (re-)build the clustered index there matching the destination table, and then insert straight from the staging table. To reduce blocking, escalation, log use etc. you could do this in batches of 10000 rows at a time, committing and/or checkpointing every so often.

You might also consider using a pre-processor (C# perhaps) that takes the log file and builds a new one with the proper sort order.

Also I think you are safer using sys.indexes etc. than hard-coding the index structures in the code. Microsoft is far less likely to change a column name in sys.indexes than someone at your shop (no offense intended) will change an index but forget to update the hard-coded definition in the procedure.

194

answered Sep 27 '22 16:09

Aaron Bertrand

Related questions
                            
                                .NET: Allow NULLS in DB fields?
                            
                                CLR: Multi Param Aggregate, Argument not in Final Output?
                            
                                How to encrypt data in sql server and decrypt it in .net apps
                            
                                Migrating Maintenance Plan from One Sql Server 2008 Instance to Another [closed]
                            
                                SQL Server Query or Tool to show Hierarchical Data
                            
                                Dynamic Linked Server Query
                            
                                How do I change where databases are created in SQL Server Management Studio? [closed]
                            
                                Drop all objects in SQL DB?
                            
                                Data conversion error when using CAST in a CASE statement
                            
                                how to convert xml out put of a TSQL query into varchar output
                            
                                SQL Server 2008R2 problem: "Property Size is not available for Database []..."
                            
                                Problem with NOLOCK on SQL 2008 using a temporary table and select statements
                            
                                Creating a table in oracle using sql server table
                            
                                Generate changes script
                            
                                Is there a find sql statement behind a view
                            
                                @table variable or #temp table : Performance
                            
                                Microsoft SQL Server 2008 - Dates
                            
                                Getting an error converting data type nvarchar to int from application (C#) but not when running the SP on it's own
                            
                                Invoke or BeginInvoke cannot be called on a control until the window handle has been created
                            
                                query optimizer operator choice - nested loops vs hash match (or merge)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Bulk Insert into a HEAP vs CLUSTERED index where minimal logging is not an option (SQL Server 2008)

Tags:

sql-server-2008

bulkinsert

informatica-powercenter

JStead

People also ask

1 Answers

Aaron Bertrand

Recent Activity

Donate For Us