I'm currently trying to bulkinsert a datatable into a database. It works fine and fast. The only problem occurs if there are any rows that are already in the database (duplicate key).
To counter this I have modified my program so that I first check for each new entry if it already exists in the database or not. Which is.......slow (In the current cases I don't have many entries but later on its over 200k entries that I need to check and that a few times). Thus I need to make it faster as it is now (if possible).
The datatable is structured this way:
DataTable transactionTable.Columns.Add("DeviceId", typeof(Int32));
transactionTable.Columns.Add("LogDate", typeof(DateTime));
transactionTable.Columns.Add("LogType", typeof(Int32));
transactionTable.Columns.Add("LogText", typeof(String));
transactionTable.PrimaryKey = new DataColumn[3] {
transactionTable.Columns[0],
transactionTable.Columns[1],
transactionTable.Columns[2]
};
What I have so far is the following:
DataTable insertTable = transactionTable.Copy();
insertTable.Clear();
using (SqlConnection sqlcon = new SqlConnection(this.GetConnString()))
{
sqlcon.Open();
foreach (var entry in transactionTable.AsEnumerable())
{
using (SqlCommand sqlCom = sqlCon.CreateCommand())
{
sqlCom.Parameters.Clear();
sqlCom.CommandText = "SELECT 1 FROM myTable WHERE"
+ " DeviceId = @DeviceId AND LogDate = @LogDate"
+ " AND LogType = @LogType"
sqlCom.Parameters.AddWithValue("@DeviceId", entry.Field<Int32>("DeviceId"));
sqlCom.Parameters.AddWithValue("@LogDate", entry.Field<DateTime>("LogDate"));
sqlCom.Parameters.AddWithValue("@LogType", entry.Field<Int32>("LogType"));
using (SqlDataREader myRead = sqlCon.ExecuteReader()
{
myRead.Read();
if (myRead.HasRows == false)
{
insertTable.Rows.Add(entry.ItemArray);
}
}
}
}
}
// And afterwards the bulkinsert which I think is out of scope for the question itself
// (I use the insertTable there)
Now my question is: Is there any way to do this faster in order to not get the key violation problem?
In this case I would use some staging table. Here is some steps:
SqlBulkCopy
)So you will need to delete foreach statement in your code, add stored proc for inserting to base table, add stored proc for truncating. Or you can combine last 2 steps in one.
I have a similar set up.
I'm using a stored procedure with a Table-Valued parameter and MERGE
statement. See also Table-Valued Parameters for example how to use them in .NET.
I would shift the focus of the problem from simple bulk insert to merging a batch of rows into a table with existing data.
Destination table
CREATE TABLE [dbo].[MyTable](
[DeviceId] [int] NOT NULL,
[LogDate] [datetime] NOT NULL,
[LogType] [int] NOT NULL,
[LogText] [nvarchar](50) NOT NULL,
CONSTRAINT [PK_MyTable] PRIMARY KEY CLUSTERED
(
[DeviceId] ASC,
[LogDate] ASC,
[LogType] ASC
))
Create user-defined table type
CREATE TYPE [dbo].[MyTableType] AS TABLE(
[DeviceId] [int] NOT NULL,
[LogDate] [datetime] NOT NULL,
[LogType] [int] NOT NULL,
[LogText] [nvarchar](50) NOT NULL,
PRIMARY KEY CLUSTERED
(
[DeviceId] ASC,
[LogDate] ASC,
[LogType] ASC
))
Test and measure whether specifying PRIMARY KEY
for the TYPE
makes overall process faster or slower.
Stored procedure with TVP
CREATE PROCEDURE [dbo].[MergeMyTable]
@ParamRows dbo.MyTableType READONLY
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
BEGIN TRANSACTION;
BEGIN TRY
MERGE INTO dbo.MyTable as Dest
USING
(
SELECT
TT.[DeviceId],
TT.[LogDate],
TT.[LogType],
TT.[LogText]
FROM
@ParamRows AS TT
) AS Src
ON
(Dest.[DeviceId] = Src.[DeviceId]) AND
(Dest.[LogDate] = Src.[LogDate]) AND
(Dest.[LogType] = Src.[LogType])
WHEN MATCHED THEN
UPDATE SET
Dest.[LogText] = Src.[LogText]
WHEN NOT MATCHED BY TARGET THEN
INSERT
([DeviceId]
,[LogDate]
,[LogType]
,[LogText])
VALUES
(Src.[DeviceId],
Src.[LogDate],
Src.[LogType],
Src.[LogText]);
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION;
END CATCH;
END
Call this stored procedure passing it a batch of rows to merge. Test and measure how performance changes with the size of the batch. Try batches with 1K, 10K, 100K rows.
If you never want to update existing rows with new values, remove the WHEN MATCHED THEN
part of the MERGE
, it will work faster.
You can drop and recreate your index with the IGNORE_DUP_KEY
set to ON. Something like this:
ALTER TABLE datatable
ADD CONSTRAINT PK_datatable
PRIMARY KEY CLUSTERED (DeviceId,LogDate,LogType,LogText)
WITH (IGNORE_DUP_KEY = ON)
What this option does is report a duplicate key error with a different severity and message when any duplicate inserts for the index are attempted. It will not allow duplicates to be entered, but it will continue to insert all the records that are not duplicates and only give a warning message if duplicates were found and ignored.
More info at this link: Creating Unique Indexes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With