Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to reduce SQL Server transaction log usage

We have the application that writes logs in Azure SQL tables. The structure of the table is the following.

CREATE TABLE [dbo].[xyz_event_history]
(
    [event_history_id] [uniqueidentifier] NOT NULL,
    [event_date_time] [datetime] NOT NULL,
    [instance_id] [uniqueidentifier] NOT NULL,
    [scheduled_task_id] [int] NOT NULL,
    [scheduled_start_time] [datetime] NULL,
    [actual_start_time] [datetime] NULL,
    [actual_end_time] [datetime] NULL,
    [status] [int] NOT NULL,
    [log] [nvarchar](max) NULL,

    CONSTRAINT [PK__crg_scheduler_event_history] PRIMARY KEY NONCLUSTERED 
    (
        [event_history_id] ASC
    )
)

Table stored as clustered index by scheduled_task_id column (non-unique).

CREATE CLUSTERED INDEX [IDX__xyz_event_history__scheduled_task_id] ON [dbo].[xyz_event_history]
(
    [scheduled_task_id] ASC
)

The event_history_id generated by the application, it's random (not sequential) GUID. The application either creates, updates and removes old entities from the table. The log column usually holds 2-10 KB of data, but it can grow up to 5-10 MB in some cases. The items are usually accessed by PK (event_history_id) and the most frequent sort order is event_date_time desc.

The problem we see after we lowered performance tier for the Azure SQL to "S3" (100 DTUs) is crossing transaction log rate limits. It can be clearly seen within sys.dm_exec_requests table - there will be records with wait type LOG_RATE_GOVERNOR (msdn).

Occurs when DB is waiting for quota to write to the log.

The operations I've noticed that cause big impact on log rate are deletions from xyz_event_history and updates in log column. The updates made in the following fashion.

UPDATE xyz_event_history
SET [log] = COALESCE([log], '') + @log_to_append
WHERE event_history_id = @id

The recovery model for Azure SQL databases is FULL and can not be changed.

Here is the physical index statistics - there are many pages that crossed 8K per row limit.

TableName           AllocUnitTp PgCt    AvgPgSpcUsed        RcdCt   MinRcdSz    MaxRcdSz
xyz_event_history   IN_ROW_DATA 4145    47.6372868791698    43771   102         7864
xyz_event_history   IN_ROW_DATA 59      18.1995058067705    4145    11          19
xyz_event_history   IN_ROW_DATA 4       3.75277983691623    59      11          19
xyz_event_history   IN_ROW_DATA 1       0.914257474672597   4       11          19
xyz_event_history   LOB_DATA    168191  97.592290585619     169479  38          8068
xyz_event_history   IN_ROW_DATA 7062    3.65090190264393    43771   38          46
xyz_event_history   IN_ROW_DATA 99      22.0080800593032    7062    23          23
xyz_event_history   IN_ROW_DATA 1       30.5534964170991    99      23          23
xyz_event_history   IN_ROW_DATA 2339    9.15620212503089    43771   16          38
xyz_event_history   IN_ROW_DATA 96      8.70488015814184    2339    27          27
xyz_event_history   IN_ROW_DATA 1       34.3711391153941    96      27          27
xyz_event_history   IN_ROW_DATA 1054    26.5034840622683    43771   28          50
xyz_event_history   IN_ROW_DATA 139     3.81632073140598    1054    39          39
xyz_event_history   IN_ROW_DATA 1       70.3854707190511    139     39          39
  • Is there a way to reduce transaction log usage?
  • How does SQL Server log update transactions as in example above? Is it just "old" plus "new" value? (that would conceivably make adding little pieces of data frequently quite inefficient in terms of transaction log size)

UPDATE (April, 20): I've made some experiments with suggestions in answers and was impressed by difference that INSERT instead of UPDATE makes.

As per following msdn article about SQL Server Transaction log internals (https://technet.microsoft.com/en-us/library/jj835093(v=sql.110).aspx):

Log records for data modifications record either the logical operation performed or they record the before and after images of the modified data. The before image is a copy of the data before the operation is performed; the after image is a copy of the data after the operation has been performed.

This automatically makes the scenario with UPDATE ... SET X = X + 'more' highly inefficient in terms of transaction log usage - it requires "before image" capture.

I've created simple test suite to test original way of adding data to "log" column versus the way where we just insert new piece of data into the new table. The results I got quite astonishing (at lest for me, not too experienced with SQL Server guy).

The test is simple: 5'000 times add 1'024 character long parts of log - just 5MB of text as the result (not too bad as one might think).

FULL recovery mode, SQL Server 2014, Windows 10, SSD
                    UPDATE         INSERT
Duration             07:48 (!)      00:02
Data file grow        ~8MB           ~8MB
Tran. Log grow      ~218MB (!)        0MB (why?!)

enter image description here

Just 5000 updates that add 1KB of data can hang out SQL Server for 8 minutes (wow!) - I didn't expect that!

I think original question is resolved at this point, but the following ones raised:

  1. Why transaction log grow looks linear (not quadratic as we can expect when simply capturing "before" and "after" images)? From the diagram we can see that "items per second" grows proportionally to the square root - it's as expected if overhead grows linearly with amount of items inserted.
  2. Why in case with inserts transaction log appears to have the same size as before any inserts at all? I've took a look on the transaction log (with Dell's Toad) for the case with inserts and looks like only last 297 items are in there - conceivably transaction log got truncated, but why if it's FULL recovery mode?

UPDATE (April, 21). DBCC LOGINFO output for case with INSERT - before and after. The physical size of the log file matches the output - exactly 1,048,576 bytes on disk. Why it looks like transaction log remains still?

RecoveryUnitId  FileId  FileSize    StartOffset FSeqNo  Status  Parity  CreateLSN
0               2       253952      8192        131161  0       64      0        
0               2       253952      262144      131162  2       64      0        
0               2       253952      516096      131159  0       128     0        
0               2       278528      770048      131160  0       128     0        
RecoveryUnitId  FileId  FileSize    StartOffset FSeqNo  Status  Parity  CreateLSN
0               2       253952      8192        131221  0       128     0        
0               2       253952      262144      131222  0       128     0        
0               2       253952      516096      131223  2       128     0        
0               2       278528      770048      131224  2       128     0        

For those who interested I've recorded "sqlserv.exe" activities using Process Monitor - I can see that file being overwritten again and again - looks like SQL Server treats old log items as no longer needed by some reason: https://dl.dropboxusercontent.com/u/1323651/stackoverflow-sql-server-transaction-log.pml.

UPDATE (April, 24). Seems I've finally started to understand what is going on there and want to share with you. The reasoning above is true in general, but has serious caveat that also produced confusion about strange transaction log re-usage with INSERTs.

Database will behave like in SIMPLE recovery mode until first full backup is taken (even though it's in FULL recovery mode).

We can treat numbers and diagram above as valid for SIMPLE recovery mode, and I have to redo my measurement for real FULL - they are even more astonishing.

                    UPDATE         INSERT
Duration             13:20 (!)      00:02
Data file grow         8MB           11MB
Tran. log grow      55.2GB (!)       14MB

real FULL recovery mode UPDATE stats

like image 625
Eugene D. Gubenkov Avatar asked Apr 16 '16 07:04

Eugene D. Gubenkov


2 Answers

You are violating one of the basic tenants of the normal form with the log field. The log field seams to be holding an appending sequence of info related to the primary. The fix is to stop doing that.

1 Create a table. xyz_event_history_LOG(event_history_id,log_sequence#,log)

2 stop doing updates to the log field in [xyz_event_history], instead do inserts to the xyz_event_history_LOG

The amount of data in your transaction log will decrease GREATLY.

like image 67
jerrylagrou Avatar answered Oct 16 '22 19:10

jerrylagrou


The transaction log contains all the changes to a database in the order they were made, so if you update a row multiple times you will get multiple entries to that row. It does store the entire value, old and new, so you are correct that multiple small updates to a large data type such as nvarchar(max) would be inefficient, you would be better off storing the updates in separate columns if they are only small values.

like image 38
Simon Avatar answered Oct 16 '22 18:10

Simon