I am writing into two tables on SQL server row by row from C#.
My C# app is passing parameters into 2 stored procedures which are each inserting rows into tables.
Each time I call a stored procedure I open and then close the connection.
I need to write about 100m rows into the database.
Should I be closing and opening the connection every time I call the stored procedure?
Here is an example what I am doing:
public static void Insert_TestResults(TestResults testresults)
{
try
{
DbConnection cn = GetConnection2();
cn.Open();
// stored procedure
DbCommand cmd = GetStoredProcCommand(cn, "Insert_TestResults");
DbParameter param;
param = CreateInParameter("TestName", DbType.String);
param.Value = testresults.TestName;
cmd.Parameters.Add(param);
if (testresults.Result != -9999999999M)
{
param = CreateInParameter("Result", DbType.Decimal);
param.Value = testresults.Result;
cmd.Parameters.Add(param);
}
param = CreateInParameter("NonNumericResult", DbType.String);
param.Value = testresults.NonNumericResult;
cmd.Parameters.Add(param);
param = CreateInParameter("QuickLabDumpID", DbType.Int32);
param.Value = testresults.QuickLabDumpID;
cmd.Parameters.Add(param);
// execute
cmd.ExecuteNonQuery();
if (cn.State == ConnectionState.Open)
cn.Close();
}
catch (Exception e)
{
throw e;
}
}
Here is the stored procedure on the server:
USE [SalesDWH]
GO
/****** Object: StoredProcedure [dbo].[Insert_TestResults] Script Date: 12/26/2011 10:45:08 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- =============================================
-- Author: <Author,,Name>
-- Create date: <Create Date,,>
-- Description: <Description,,>
-- =============================================
ALTER PROCEDURE [dbo].[Insert_TestResults]
-- Add the parameters for the stored procedure here
@TestName varchar (500),
@Result decimal (18,4)=null,
@NonNumericResult varchar (50)=null,
@QuickLabDumpid int
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
INSERT INTO [SalesDWH].[dbo].[TestResults]
([TestName]
,[Result]
,nonnumericresult
,[QuickLabDumpid])
VALUES
(@TestName,@Result,@nonnumericresult,@QuickLabDumpID)
END
For about 100m rows it will take like 3 days. This seems too slow for me. What can I do to speed this up? What are the standards on opening/closing connection so many times?
You don't need to open a connection for each request. You can open it once at the beginning and close it when you're done. However, with connection pooling enabled (as it is by default), opening and closing a connection is not an expensive process.
Your procedure is slow mostly because:
The fix for the first is to group your inserts into transactions -- maybe 1000 rows per transaction or something like that.
The fix for the second is to either use command batching (send multiple commands at a time, separated by semicolons), or table valued parameters. TVP are also nice because an INSERT INTO SELECT FROM
command executes as a single transaction.
Achievable insert speed is also limited by the speed of your log disk. Make sure the DB log is on a disk that's separate from the DB data. Ensuring that the log is de-fragmented and pre-grown to the size you need will also help.
Using SqlBulkCopy
is another option, which can also help minimize the load on your DB log, depending on how it's configured.
In addition, if you're inserting 100M rows at the same time, you might consider dropping any indexes on the table before you start, and re-adding them once you're done. Otherwise, if you don't insert rows in the order of a clustered index, it will become fragmented very quickly, and for nonclustered indexes, you are basically doing an insert into an additional table for each insert into the main table -- on top of the fragmentation problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With