Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sql Server 2008 R2 DC Inserts Performance Change

I have noticed an interesting performance change that happens around 1,5 million entered values. Can someone give me a good explanation why this is happening?

Table is very simple. It is consisted of (bigint, bigint, bigint, bool, varbinary(max)) I have a pk clusered index on first three bigints. I insert only boolean "true" as data varbinary(max).

From that point on, performance seems pretty constant.

Legend: Y (Time in ms) | X (Inserts 10K)

enter image description here

I am also curios about constant relatively small (sometimes very large) spikes I have on the graph.

Actual Execution Plan from before spikes.

Actual Execution Plan from before spikes

Legend:
Table I am inserting into: TSMDataTable
1. BigInt DataNodeID - fk
2. BigInt TS - main timestapm
3. BigInt CTS - modification timestamp
4. Bit: ICT - keeps record of last inserted value (increases read performance)
5. Data: Data
Bool value Current time stampl keeps

Enviorment
It is local.
It is not sharing any resources.
It is fixed size database (enough so it does not expand).
(Computer, 4 core, 8GB, 7200rps, Win 7).
(Sql Server 2008 R2 DC, Processor Affinity (core 1,2), 3GB, )

like image 564
Falcon Avatar asked Sep 13 '11 09:09

Falcon


1 Answers

Have you checked the execution plan once the time goes up? The plan may change depending on statistics. Since your data grow fast, stats will change and that may trigger a different execution plan.

Nested loops are good for small amounts of data, but as you can see, the time grows with volume. The SQL query optimizer then probably switches to a hash or merge plan which is consistent for large volumes of data.

To confirm this theory quickly, try to disable statistics auto update and run your test again. You should not see the "bump" then.

EDIT: Since Falcon confirmed that performance changed due to statistics we can work out the next steps.

I guess you do a one by one insert, correct? In that case (if you cannot insert bulk) you'll be much better off inserting into a heap work table, then in regular intervals, move the rows in bulk into the target table. This is because for each inserted row, SQL has to check for key duplicates, foreign keys and other checks and sort and split pages all the time. If you can afford postponing these checks for a little later, you'll get a superb insert performance I think.

I used this method for metrics logging. Logging would go into a plain heap table with no indexes, no foreign keys, no checks. Every ten minutes, I create a new table of this kind, then with two "sp_rename"s within a transaction (swift swap) I make the full table available for processing and the new table takes the logging. Then you have the comfort of doing all the checking, sorting, splitting only once, in bulk.

Apart from this, I'm not sure how to improve your situation. You certainly need to update statistics regularly as that is a key to a good performance in general.

Might try using a single column identity clustered key and an additional unique index on those three columns, but I'm doubtful it would help much.

Might try padding the indexes - if your inserted data are not sequential. This would eliminate excessive page splitting and shuffling and fragmentation. You'll need to maintain the padding regularly which may require an off-time.

Might try to give it a HW upgrade. You'll need to figure out which component is the bottleneck. It may be the CPU or the disk - my favourite in this case. Memory not likely imho if you have one by one inserts. It should be easy then, if it's not the CPU (the line hanging on top of the graph) then it's most likely your IO holding you back. Try some better controller, better cached and faster disk...

like image 123
Robert Cutajar Avatar answered Oct 21 '22 03:10

Robert Cutajar