Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I set DataAdapter.UpdateBatchSize to a "optimal" value?

I've finally gotten my insert batch to work and now I've been fiddling with the size of the batch, but I can't see any difference in performance between a value of 50 and a value of 10000. This seems very odd to me, but I don't know what's happening behind the scene, so it might be normal behavior.

I'm inserting 160k rows into a table and the mean time for my tested values is 115 +/- 2 secs. Without batching it takes 210 secs, so I'm quite satisfied with the improvement. The target table is:

CREATE TABLE [dbo].[p_DataIdeas](
    [wave] [int] NOT NULL,
    [idnumber] [int] NOT NULL,
    [ideaID] [int] NOT NULL,
    [haveSeen] [bit] NOT NULL CONSTRAINT [DF_p_DataIdeas_haveSeen]  DEFAULT ((0)),
  CONSTRAINT [PK_p_DataIdeas] PRIMARY KEY CLUSTERED 
(
  [wave] ASC,
  [idnumber] ASC,
  [ideaID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON
) ON [PRIMARY]
) ON [PRIMARY]

I read What to look for when setting UpdateBatchSize and the answer was to simply test a couple of different values. I can understand that, but shouldn't it be possible to calculate or at least guesstimate a good value if you know the table design, the SQL question and the data that's about to be inserted?

Are there any best practices out there that someone can recommend?

like image 921
Fredrik Norlin Avatar asked Aug 25 '10 12:08

Fredrik Norlin


2 Answers

You can see the effect of batching either by looking at SQL Profiler or by calling SqlConnection.RetrieveStatistics(). What you should see is that each batch corresponds to a single round-trip to the DB.

As far as how to optimize the batch size, a very rough rule-of-thumb is that performance tends to stop improving with batch sizes above about 50 -- in fact, sometimes larger batches can run more slowly than smaller ones. If I'm too busy to test, I generally start with a batch of about 20 (unless I'm using table valued parameters, where batches up to 500 can be faster than smaller ones). However, the optimal number depends on things like the total size of the inserts (will they all fit in RAM), how fast the disks are that your DB log is located on, whether the log is on a drive/LUN of its own (big perf cost if it's not), etc.

The achievable speed is generally limited first by the number of round trips, then by transaction size, then log disk speed (particularly whether sequential access is possible or if it's forced to random due to competition with other files on the same spindles), and finally RAM. However, all of the factors also inter-related to some extent.

The first step in improving the perf of your inserts would be to do them in transactions -- maybe one transaction every batch or two. Beyond that, table valued parameters is probably the next step, using a stored procedure with INSERT INTO Table SELECT column FROM @TableArgument.

like image 79
RickNZ Avatar answered Dec 15 '22 00:12

RickNZ


Although changing UpdateBatchSize will help to some extent, the basic approach of using a DataAdapter to update a lot of records is going to be slow. This is because ultimately, a separate SQL statement (insert, update or delete) will be generated by the DataAdapter for each row. UpdateBatchSize only affects how many of those individual statements are sent in one TSQL Batch when sent to SQL Server.

To get much bigger improvements in performance, you want SQLServer to insert/update/delete many records in one statement (typically using a JOIN of some sort). Table valued parameters (as mentioned by RickNZ) is one way of doing this. Another possibility is using SqlBulkCopy (although you will typically need to use a staging table for this).

like image 27
Moe Sisko Avatar answered Dec 15 '22 01:12

Moe Sisko