Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server index behaviour when doing bulk insert

I have an application that inserts multiple rows at once into SQL Server.

I use either SqlBulkCopy class or self-written code that generates a gigantic insert into table_name(...) values (...) statement.

My table has several indexes and a clustered one.

The question is: how are those indexes updated? For each row I insert? For each transaction?

Somewhat odd question - is there a general term for this scenario, like 'bulk-insert indexing behaviour'? I tried to google several keyword combinations, haven't found anything. The reason I ask is because I sometimes do work with Postgres and would like to know its behaviour as well.

I've been trying to find an article on this topic, several times, without any luck.

If you can point me to any docs, article or a book with a relevant chapter, that'd be great

like image 986
chester89 Avatar asked Jan 31 '18 11:01

chester89


1 Answers

You can see how indexes are updated by examining the query plan. Consider this heap table with only non-clustered indexes.

CREATE TABLE dbo.BulkInsertTest(
      Column1 int NOT NULL
    , Column2 int NOT NULL
    , Column3 int NOT NULL
    , Column4 int NOT NULL
    , Column5 int NOT NULL
    );
CREATE INDEX BulkInsertTest_Column1 ON dbo.BulkInsertTest(Column1);
CREATE INDEX BulkInsertTest_Column2 ON dbo.BulkInsertTest(Column2);
CREATE INDEX BulkInsertTest_Column3 ON dbo.BulkInsertTest(Column3);
CREATE INDEX BulkInsertTest_Column4 ON dbo.BulkInsertTest(Column4);
CREATE INDEX BulkInsertTest_Column5 ON dbo.BulkInsertTest(Column5);
GO

Below is the execution plan for a singleton INSERT.

INSERT INTO dbo.BulkInsertTest(Column1, Column2, Column3, Column4, Column5) VALUES
     (1, 2, 3, 4, 5);

INSERT execution plan

The execution plan shows only the Table Insert operator so the new non-clustered index rows were inserted intrinsically during table insert operation itself. A large batch of singleton INSERT statements will yield this same plan for each insert statement.

I get a similar plan with a single INSERT statement with a large number of rows specified via a row constructor, with the only difference being addition of a Constant Scan operator to emit the rows.

INSERT INTO dbo.BulkInsertTest(Column1, Column2, Column3, Column4, Column5) VALUES
     (1, 2, 3, 4, 5)
    ,(1, 2, 3, 4, 5)
    ,(1, 2, 3, 4, 5)
    ,...
    ,(1, 2, 3, 4, 5);

enter image description here

Here's the execution plan for a T-SQL BULK INSERT statement (using a dummy empty file as the source). With the BULK INSERT, SQL Server added additional query plan operators to optimize the index inserts. The rows were spooled after inserting into the table, and then rows from the spool sorted and inserted into each index separately as a mass insert operation. This method reduces the overhead for large insert operations. You may also see similar plans for INSERT...SELECT queries.

BULK INSERT dbo.BulkInsertTest
    FROM 'c:\Temp\BulkInsertTest.txt';

BULK INSERT execution plan

I verified that SqlBulkCopy generates the same execution plan as a T-SQL BULK INSERT by capturing the actual plans with an Extended Event trace. Below is the trace DDL and PowerShell script I used.

Trace DDL:

CREATE EVENT SESSION [SqlBulkCopyTest] ON SERVER 
ADD EVENT sqlserver.query_post_execution_showplan(
    ACTION(sqlserver.client_app_name,sqlserver.sql_text)
    WHERE ([sqlserver].[equal_i_sql_unicode_string]([sqlserver].[client_app_name],N'SqlBulkCopyTest') 
        AND [sqlserver].[like_i_sql_unicode_string]([sqlserver].[sql_text],N'insert bulk%') 
        ))
ADD TARGET package0.event_file(SET filename=N'SqlBulkCopyTest');
GO

PowerShell script:

$connectionString = "Data Source=.;Initial Catalog=YourUserDatabase;Integrated Security=SSPI;Application Name=SqlBulkCopyTest"

$dt = New-Object System.Data.DataTable;
$null = $dt.Columns.Add("Column1", [System.Type]::GetType("System.Int32"))
$null = $dt.Columns.Add("Column2", [System.Type]::GetType("System.Int32"))
$null = $dt.Columns.Add("Column3", [System.Type]::GetType("System.Int32"))
$null = $dt.Columns.Add("Column4", [System.Type]::GetType("System.Int32"))
$null = $dt.Columns.Add("Column5", [System.Type]::GetType("System.Int32"))

$row = $dt.NewRow()
[void]$dt.Rows.Add($row)
$row["Column1"] = 1
$row["Column2"] = 2
$row["Column3"] = 3
$row["Column4"] = 4
$row["Column5"] = 5

$bcp = New-Object System.Data.SqlClient.SqlBulkCopy($connectionString)
$bcp.DestinationTableName = "dbo.BulkInsertTest"
$bcp.WriteToServer($dt)

EDIT

Credit to Vladimir Baranov for providing this blog article by Microsoft Data Platform MVP Paul White, which details SQL Server's cost-based index maintenance strategy.

EDIT 2

I see from your revised question that your actual situation is a table with a clustered index rather than a heap. The plans will be similar to that of the heap examples above, except of course, that data will be inserted using a Clustered Index Insert operator instead of a Table Insert.

An ORDER hint may be specified during bulk insert operations into a table with a clustered index. When the specified order matches that of the clustered index, SQL Server can eliminate the sort operator before the Clustered Index Insert since it assumes data are already sorted per the hint.

Unfortunately, System.Data.SqlClient.SqlBulkCopy does not support an ORDER hint via that API. As @benjol mentioned in the comment, the newer Microsoft.Data.SqlClient.SqlBulkCopy includes a ColumnOrderHints property where one can specify the target table clustered index columns and sort order.

like image 86
Dan Guzman Avatar answered Oct 19 '22 01:10

Dan Guzman