Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# multiple parallel inserts in database

I have a datatable with around 3000 rows. Each of those rows need to be inserted in a database table. Currently, i am running a foreach loop as under:

obj_AseCommand.CommandText = sql_proc;
obj_AseCommand.CommandType = CommandType.StoredProcedure;
obj_AseCommand.Connection = db_Conn;
obj_AseCommand.Connection.Open();

foreach (DataRow dr in dt.Rows)                
{
    obj_AseCommand.Parameters.AddWithValue("@a", dr["a"]);
    obj_AseCommand.Parameters.AddWithValue("@b", dr["b"]);
    obj_AseCommand.Parameters.AddWithValue("@c", dr["c"]);

    obj_AseCommand.ExecuteNonQuery();
    obj_AseCommand.Parameters.Clear();
}

obj_AseCommand.Connection.Close();

Can you please advise how can I do parallelly execute the SP in database since the above approach takes about 10 minutes to insert 3000 rows.

like image 312
Harsh Avatar asked Jan 05 '15 06:01

Harsh


1 Answers

Edit

In hindsight, using a Parallel.ForEach to parallelize DB insertions is slightly wasteful, as it will also consume a thread for each Connection. Arguably, an even better parallel solution would be to use the asynchronous versions of the System.Data Db Operations, such as ExecuteNonQueryAsync , start the executions (concurrently), and then use await Task.WhenAll() to wait upon completion - this will avoid the Thread overhead to the caller, although the overall Db performance won't likely be any quicker. More here

Original Answer, multiple Parallel Inserts into Database

You can do this in parallel using TPL, e.g. specifically with the localInit overload of Parallel.ForEach. You will almost certainly want to look at throttling the amount of parallelism by tweaking MaxDegreeOfParalelism so that you don't inundate your database:

Parallel.ForEach(dt.Rows,
    // Adjust this for optimum throughput vs minimal impact to your other DB users
    new ParallelOptions { MaxDegreeOfParallelism = 4 },
    () =>
    {
        var con = new SqlConnection();
        var cmd = con.CreateCommand();
        cmd.CommandText = sql_proc;
        cmd.CommandType = CommandType.StoredProcedure;
        con.Open();

        cmd.Parameters.Add(new SqlParameter("@a", SqlDbType.Int));
        // NB : Size sensitive parameters must have size
        cmd.Parameters.Add(new SqlParameter("@b", SqlDbType.VarChar, 100));
        cmd.Parameters.Add(new SqlParameter("@c", SqlDbType.Bit));
        // Prepare won't help with SPROCs but can improve plan caching for adhoc sql
        // cmd.Prepare();
        return new {Conn = con, Cmd = cmd};
    },
    (dr, pls, localInit) =>
    {
        localInit.Cmd.Parameters["@a"] = dr["a"];
        localInit.Cmd.Parameters["@b"] = dr["b"];
        localInit.Cmd.Parameters["@c"] = dr["c"];
        localInit.Cmd.ExecuteNonQuery();
        return localInit;
    },
    (localInit) =>
    {
        localInit.Cmd.Dispose();
        localInit.Conn.Dispose();
    });

Notes:

  • Unless you really know what you are doing, in general we should leave TPL to decide on the degree of parallelism. However, depending on how much contention (read: locks for database work) for resources, restricting the upper limit of concurrent tasks may be required (trial and error may be useful, e.g. try with concurrencies of 4, 8, 16 concurrent tasks etc to see which gives most throughput, and monitor the locking and CPU load on your Sql Server.
  • Similarly, leaving TPL's default partitioner is usually good enough to partition the DataRows across the tasks.
  • Each Task will need its own separate Sql Connection.
  • Rather than creating and disposing the command on each call, create it once per task and then keep reusing the same Command, just updating parameters each time.
  • Use the LocalInit / Local Finally lambdas to do per task set up and cleanup, like Disposing commands and connections.
  • You could also consider using .Prepare() if you are using AdHoc Sql or Sql versions prior to 2005
  • I'm assuming enumerating a DataTable's rows is thread safe. You'll want to double check this of course.

Side Note:

10 minutes for 3000 rows is excessive even with a wide table and a single thread. What does your proc do? I've assumed the processing isn't trivial, hence the need for the SPROC, but if you are just doing simple inserts, as per @3dd's comment, SqlBulkCopy will yield inserts of ~ 1M rows per minute on a reasonably narrow table.

like image 73
StuartLC Avatar answered Oct 04 '22 23:10

StuartLC