Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Slow insert performance with large amounts of data (SQL Server / C#)

Tags:

c#

sql-server

I'm working with electronic equipment that digitizes waveforms in real-time (each device generates around 1000 512 byte arrays per second - we have 12 devices). I've written a client for these devices in C# that for the most part works fine and has no performance issues.

However, one of the requirements for the application is archival, and Microsoft SQL Server 2010 was mandated as the storage mechanism (outside of my control). The database layout is very simple: there is one table per device per day ("Archive_Dev02_20131015" etc). Each table has an Id column, a timestamp column, a Data column (varbinary) and 20 more integer columns with some metadata. There's a clustered primary key on Id and timestamp, and another separate index on timestamp. My naive approach was queue all data in the client application, and then inserting everything into the database in 5 second intervals using SqlCommand.

The basic mechanism looks like this:

using (SqlTransaction transaction = connection.BeginTransaction()
{
    //Beginning of the insert sql statement...
    string sql = "USE [DatabaseName]\r\n" +
                 "INSERT INTO [dbo].[Archive_Dev02_20131015]\r\n" + 
                 "(\r\n" +
                 "   [Timestamp], \r\n" +
                 "   [Data], \r\n" +
                 "   [IntField1], \r\n" +
                 "   [...], \r\n" +                         
                 ") \r\n" +
                 "VALUES \r\n" +
                 "(\r\n" +
                 "   @timestamp, \r\n" + 
                 "   @data, \r\n" + 
                 "   @int1, \r\n" +
                 "   @..., \r\n" +  
                 ")";

    using (SqlCommand cmd = new SqlCommand(sql))
    {
        cmd.Connection = connection;
    cmd.Transaction = transaction;

    cmd.Parameters.Add("@timestamp", System.Data.SqlDbType.DateTime);
    cmd.Parameters.Add("@data", System.Data.SqlDbType.Binary);
    cmd.Parameters.Add("@int1", System.Data.SqlDbType.Int);

    foreach (var sample in samples)
    {
            cmd.Parameters[0].Value = amples.ReceiveDate;
            cmd.Parameters[1].Value = samples.Data;       //Data is a byte array
            cmd.Parameters[1].Size  = samples.Data.Length;
            cmd.Parameters[2].Value = sample.IntValue1;
             ...

            int affected = cmd.ExecuteNonQuery();

            if (affected != 1)
            {
                throw new Exception("Could not insert sample into the database!");
            }
          }
       }
   }

   transaction.Commit();                
}       

To summarize: a batch of 1 transaction with a loop that generates insert statements and executes them.

This method turned out to be very, very slow. On my machine (i5-2400 @ 3.1GHz, 8GB RAM, using .NET 4.0 and SQL Server 2008, 2 internal HDs in mirror, everything runs locally), it takes about 2,5 seconds to save the data from 2 devices, so saving 12 devices each 5 seconds is impossible.

To compare, I've written a small SQL script (actually I extracted the code C# runs with the sql server profiler) that does the same directly on the server (still running on my own machine):

set statistics io on
go

begin transaction
go

declare @i int = 0;

while @i < 24500 begin
SET @i = @i + 1

exec sp_executesql N'USE [DatabaseName]                                                                
INSERT INTO [dbo].[Archive_Dev02_20131015]                                                      
(                                                                                      
   [Timestamp],                                                                        
   [Data],                                                                             
   [int1],                                                                       
    ...                                                    
   [int20]                                                                                
)                                                                                      
VALUES                                                                                 
(                                                                                      
   @timestamp,                                                                         
   @data,                                                                              
   @compressed,                                                                        
   @int1,                                                                           
   ...                                                                  
   @int20,                                                                   

)',N'@timestamp datetime,@data binary(118),@int1 int,...,@int20 int,',
@timestamp='2013-10-14 14:31:12.023',
@data=0xECBD07601C499625262F6DCA7B7F4AF54AD7E074A10880601324D8904010ECC188CDE692EC1D69472329AB2A81CA6556655D661640CCED9DBCF7DE7BEFBDF7DE7BEFBDF7BA3B9D4E27F7DFFF3F5C6664016CF6CE4ADAC99E2180AAC81F3F7E7C1F3F22FEEF5FE347FFFDBFF5BF1FC6F3FF040000FFFF,
@int=0,
...
@int20=0
end

commit transaction

This does (imo, but I'm probably wrong ;) ) the same thing, only this time I'm using 24500 iterations, to simulate the 12 devices at once. The query takes about 2 seconds. If I use the same amount of iterations as the C# version, the query runs in less than a second.

So my first question is: why does it run way faster on SQL server than in C#? Does this have anything to do with the connection (local tcp)?

To make matters more confusing (to me) this code runs twice as slow on the production server (IBM bladecenter, 32GB ram, fiber connection to SAN, ... filesystem operations are really fast). I've tried looking at the sql activity monitor and write performance never goes above 2MB/sec, but this might as well be normal. I'm a complete newbie to sql server (about the polar opposite of a competent DBA in fact).

Any ideas on how I can make the C# code more performant?

like image 967
wouter Avatar asked Oct 15 '13 10:10

wouter


2 Answers

By far the best approach for loading this sort of data is to use a table-valued parameter, and a stored procedure that takes the data. A really simple example of a table type and procedure that uses it would be:

CREATE TYPE [dbo].[StringTable]
AS TABLE ([Value] [nvarchar] (MAX) NOT NULL)
GO

CREATE PROCEDURE [dbo].[InsertStrings]
  @Paths [dbo].[StringTable] READONLY
AS
INSERT INTO [dbo].[MyTable] ([Value])
SELECT [Value] FROM @Paths
GO

Then the C# code would be something along the lines of (please bear in mind that I've typed this into the S/O editor so there might be typos):

private static IEnumerable<SqlDataRecord> TransformStringList(ICollection<string> source)
{
     if (source == null || source.Count == 0)
     {
         return null;
     }
     return GetRecords(source, 
                       () => new SqlDataRecord(new SqlMetaData("Value", SqlDbType.NVarChar, -1)), 
                       (record, value) => record.SetString(0, value));
}

private static IEnumerable<SqlDataRecord> GetRecords<T>(IEnumerable<T> source, Func<SqlDataRecord> factory, Action<SqlDataRecord, T> hydrator)
{
    SqlDataRecord dataRecord = factory();
    foreach (var value in source)
    {
        hydrator(dataRecord, value);
        yield return dataRecord;
    }
}

private InsertStrings(ICollection<string> strings, SqlConnection connection)
{
    using (var transaction = connection.BeginTransaction())
    {
        using (var cmd = new SqlCommand("dbo.InsertStrings"))
        {
            cmd.Connection = connection;
            cmd.Transaction = transaction;
            cmd.CommandType = CommandType.StoredProcedure;
            cmd.Parameters.Add(new SqlParameter("@Paths", SqlDbType.Structured) { Value = TransformStringList(strings) };
            cmd.ExecuteNonQuery();
        }
    }
}

This approach has speed that rivals SqlBulkCopy, but it also yields some better control through the ability to run the things that you're updating through a procedure, and also makes it a lot easier to deal with concurrency.

Edit -> Just for completeness, this approach works on SQL Server 2008 and up. Seeing as there isn't such a thing as SQL Server 2010 I thought I'd better mention that.

like image 61
Matt Whitfield Avatar answered Nov 15 '22 02:11

Matt Whitfield


In sql server,

CREATE TYPE [dbo].[ArchiveData]
AS TABLE (
    [Timestamp] [DateTime] NOT NULL,
    [Data] [VarBinary](MAX) NOT NULL,
    [IntField1] [Int] NOT NULL,
    [...] [Int] NOT NULL,
    [IntField20] NOT NULL)
GO

Then your code should be something like the code below. This code uses a Table Value Parameter to insert all pending data at once, is a single transaction.

Note the ommission of the the slow and unecessaery USE DATABASE and the use of verbatim strings (@"") to make the code more readable.

// The insert sql statement.
string sql =
@"INSERT INTO [dbo].[Archive_Dev02_20131015] (
    [Timestamp],
    [Data],
    [IntField1],
    [...],                         
    [IntField20])
 SELECT * FROM @data;";

using (SqlCommand cmd = new SqlCommand(sql))
{
    using (SqlTransaction transaction = connection.BeginTransaction()
    {
        cmd.Connection = connection;
        cmd.Transaction = transaction;
        cmd.Parameters.Add(new SqlParameter("@data", SqlDbType.Structured)
            {
                Value = TransformSamples(samples);
            });

        int affected = cmd.ExecuteNonQuery();
        transaction.Commit();
    }
}

...

private static IEnumerable<SqlDataRecord> TransformSamples(
        {YourSampleType} samples)
{
    var schema = new[]
    {
        new SqlMetaData("Timestamp", SqlDbType.DateTime),
        new SqlMetaData("Timestamp", SqlDbType.VarBinary, -1),
        new SqlMetaData("IntField1", SqlDbType.Int),
        new SqlMetaData("...", SqlDbType.Int),
        new SqlMetaData("IntField20", SqlDbType.Int)
    };

    foreach (var sample in samples)
    {
        var row = new SqlDataRecord(schema);
        row.SetSqlDate(0, sample.ReceiveDate);
        row.SetSqlBinary(1, sample.Data);
        row.SetSqlInt(2, sample.Data.Length);
        row.SetSqlInt(..., ...);
        row.SetSqlInt(24, sample.IntValue19);
        yield return row;
    }
}
like image 39
Jodrell Avatar answered Nov 15 '22 04:11

Jodrell