I have an application that inserts multiple rows at once into SQL Server.
I use either SqlBulkCopy
class or self-written code that generates a gigantic insert into table_name(...) values (...)
statement.
My table has several indexes and a clustered one.
The question is: how are those indexes updated? For each row I insert? For each transaction?
Somewhat odd question - is there a general term for this scenario, like 'bulk-insert indexing behaviour'? I tried to google several keyword combinations, haven't found anything. The reason I ask is because I sometimes do work with Postgres and would like to know its behaviour as well.
I've been trying to find an article on this topic, several times, without any luck.
If you can point me to any docs, article or a book with a relevant chapter, that'd be great
You can see how indexes are updated by examining the query plan. Consider this heap table with only non-clustered indexes.
CREATE TABLE dbo.BulkInsertTest(
Column1 int NOT NULL
, Column2 int NOT NULL
, Column3 int NOT NULL
, Column4 int NOT NULL
, Column5 int NOT NULL
);
CREATE INDEX BulkInsertTest_Column1 ON dbo.BulkInsertTest(Column1);
CREATE INDEX BulkInsertTest_Column2 ON dbo.BulkInsertTest(Column2);
CREATE INDEX BulkInsertTest_Column3 ON dbo.BulkInsertTest(Column3);
CREATE INDEX BulkInsertTest_Column4 ON dbo.BulkInsertTest(Column4);
CREATE INDEX BulkInsertTest_Column5 ON dbo.BulkInsertTest(Column5);
GO
Below is the execution plan for a singleton INSERT
.
INSERT INTO dbo.BulkInsertTest(Column1, Column2, Column3, Column4, Column5) VALUES
(1, 2, 3, 4, 5);
The execution plan shows only the Table Insert operator so the new non-clustered index rows were inserted intrinsically during table insert operation itself. A large batch of singleton INSERT statements will yield this same plan for each insert statement.
I get a similar plan with a single INSERT statement with a large number of rows specified via a row constructor, with the only difference being addition of a Constant Scan operator to emit the rows.
INSERT INTO dbo.BulkInsertTest(Column1, Column2, Column3, Column4, Column5) VALUES
(1, 2, 3, 4, 5)
,(1, 2, 3, 4, 5)
,(1, 2, 3, 4, 5)
,...
,(1, 2, 3, 4, 5);
Here's the execution plan for a T-SQL BULK INSERT
statement (using a dummy empty file as the source). With the BULK INSERT
, SQL Server added additional query plan operators to optimize the index inserts. The rows were spooled after inserting into the table, and then rows from the spool sorted and inserted into each index separately as a mass insert operation. This method reduces the overhead for large insert operations. You may also see similar plans for INSERT...SELECT
queries.
BULK INSERT dbo.BulkInsertTest
FROM 'c:\Temp\BulkInsertTest.txt';
I verified that SqlBulkCopy
generates the same execution plan as a T-SQL BULK INSERT
by capturing the actual plans with an Extended Event trace. Below is the trace DDL and PowerShell script I used.
Trace DDL:
CREATE EVENT SESSION [SqlBulkCopyTest] ON SERVER
ADD EVENT sqlserver.query_post_execution_showplan(
ACTION(sqlserver.client_app_name,sqlserver.sql_text)
WHERE ([sqlserver].[equal_i_sql_unicode_string]([sqlserver].[client_app_name],N'SqlBulkCopyTest')
AND [sqlserver].[like_i_sql_unicode_string]([sqlserver].[sql_text],N'insert bulk%')
))
ADD TARGET package0.event_file(SET filename=N'SqlBulkCopyTest');
GO
PowerShell script:
$connectionString = "Data Source=.;Initial Catalog=YourUserDatabase;Integrated Security=SSPI;Application Name=SqlBulkCopyTest"
$dt = New-Object System.Data.DataTable;
$null = $dt.Columns.Add("Column1", [System.Type]::GetType("System.Int32"))
$null = $dt.Columns.Add("Column2", [System.Type]::GetType("System.Int32"))
$null = $dt.Columns.Add("Column3", [System.Type]::GetType("System.Int32"))
$null = $dt.Columns.Add("Column4", [System.Type]::GetType("System.Int32"))
$null = $dt.Columns.Add("Column5", [System.Type]::GetType("System.Int32"))
$row = $dt.NewRow()
[void]$dt.Rows.Add($row)
$row["Column1"] = 1
$row["Column2"] = 2
$row["Column3"] = 3
$row["Column4"] = 4
$row["Column5"] = 5
$bcp = New-Object System.Data.SqlClient.SqlBulkCopy($connectionString)
$bcp.DestinationTableName = "dbo.BulkInsertTest"
$bcp.WriteToServer($dt)
EDIT
Credit to Vladimir Baranov for providing this blog article by Microsoft Data Platform MVP Paul White, which details SQL Server's cost-based index maintenance strategy.
EDIT 2
I see from your revised question that your actual situation is a table with a clustered index rather than a heap. The plans will be similar to that of the heap examples above, except of course, that data will be inserted using a Clustered Index Insert operator instead of a Table Insert.
An ORDER
hint may be specified during bulk insert operations into a table with a clustered index. When the specified order matches that of the clustered index, SQL Server can eliminate the sort operator before the Clustered Index Insert since it assumes data are already sorted per the hint.
Unfortunately, System.Data.SqlClient.SqlBulkCopy
does not support an ORDER
hint via that API. As @benjol mentioned in the comment, the newer Microsoft.Data.SqlClient.SqlBulkCopy
includes a ColumnOrderHints property where one can specify the target table clustered index columns and sort order.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With