We have the application that writes logs in Azure SQL tables. The structure of the table is the following.
CREATE TABLE [dbo].[xyz_event_history]
(
[event_history_id] [uniqueidentifier] NOT NULL,
[event_date_time] [datetime] NOT NULL,
[instance_id] [uniqueidentifier] NOT NULL,
[scheduled_task_id] [int] NOT NULL,
[scheduled_start_time] [datetime] NULL,
[actual_start_time] [datetime] NULL,
[actual_end_time] [datetime] NULL,
[status] [int] NOT NULL,
[log] [nvarchar](max) NULL,
CONSTRAINT [PK__crg_scheduler_event_history] PRIMARY KEY NONCLUSTERED
(
[event_history_id] ASC
)
)
Table stored as clustered index by scheduled_task_id
column (non-unique).
CREATE CLUSTERED INDEX [IDX__xyz_event_history__scheduled_task_id] ON [dbo].[xyz_event_history]
(
[scheduled_task_id] ASC
)
The event_history_id
generated by the application, it's random (not sequential) GUID. The application either creates, updates and removes old entities from the table. The log
column usually holds 2-10 KB of data, but it can grow up to 5-10 MB in some cases. The items are usually accessed by PK (event_history_id
) and the most frequent sort order is event_date_time desc
.
The problem we see after we lowered performance tier for the Azure SQL to "S3" (100 DTUs) is crossing transaction log rate limits. It can be clearly seen within sys.dm_exec_requests table - there will be records with wait type LOG_RATE_GOVERNOR
(msdn).
Occurs when DB is waiting for quota to write to the log.
The operations I've noticed that cause big impact on log rate are deletions from xyz_event_history
and updates in log
column. The updates made in the following fashion.
UPDATE xyz_event_history
SET [log] = COALESCE([log], '') + @log_to_append
WHERE event_history_id = @id
The recovery model for Azure SQL databases is FULL
and can not be changed.
Here is the physical index statistics - there are many pages that crossed 8K per row limit.
TableName AllocUnitTp PgCt AvgPgSpcUsed RcdCt MinRcdSz MaxRcdSz xyz_event_history IN_ROW_DATA 4145 47.6372868791698 43771 102 7864 xyz_event_history IN_ROW_DATA 59 18.1995058067705 4145 11 19 xyz_event_history IN_ROW_DATA 4 3.75277983691623 59 11 19 xyz_event_history IN_ROW_DATA 1 0.914257474672597 4 11 19 xyz_event_history LOB_DATA 168191 97.592290585619 169479 38 8068 xyz_event_history IN_ROW_DATA 7062 3.65090190264393 43771 38 46 xyz_event_history IN_ROW_DATA 99 22.0080800593032 7062 23 23 xyz_event_history IN_ROW_DATA 1 30.5534964170991 99 23 23 xyz_event_history IN_ROW_DATA 2339 9.15620212503089 43771 16 38 xyz_event_history IN_ROW_DATA 96 8.70488015814184 2339 27 27 xyz_event_history IN_ROW_DATA 1 34.3711391153941 96 27 27 xyz_event_history IN_ROW_DATA 1054 26.5034840622683 43771 28 50 xyz_event_history IN_ROW_DATA 139 3.81632073140598 1054 39 39 xyz_event_history IN_ROW_DATA 1 70.3854707190511 139 39 39
UPDATE (April, 20):
I've made some experiments with suggestions in answers and was impressed by difference that INSERT
instead of UPDATE
makes.
As per following msdn article about SQL Server Transaction log internals (https://technet.microsoft.com/en-us/library/jj835093(v=sql.110).aspx):
Log records for data modifications record either the logical operation performed or they record the before and after images of the modified data. The before image is a copy of the data before the operation is performed; the after image is a copy of the data after the operation has been performed.
This automatically makes the scenario with UPDATE ... SET X = X + 'more'
highly inefficient in terms of transaction log usage - it requires "before image" capture.
I've created simple test suite to test original way of adding data to "log" column versus the way where we just insert new piece of data into the new table. The results I got quite astonishing (at lest for me, not too experienced with SQL Server guy).
The test is simple: 5'000 times add 1'024 character long parts of log - just 5MB of text as the result (not too bad as one might think).
FULL recovery mode, SQL Server 2014, Windows 10, SSD
UPDATE INSERT Duration 07:48 (!) 00:02 Data file grow ~8MB ~8MB Tran. Log grow ~218MB (!) 0MB (why?!)
Just 5000 updates that add 1KB of data can hang out SQL Server for 8 minutes (wow!) - I didn't expect that!
I think original question is resolved at this point, but the following ones raised:
FULL
recovery mode?UPDATE (April, 21).
DBCC LOGINFO
output for case with INSERT
- before and after. The physical size of the log file matches the output - exactly 1,048,576 bytes on disk.
Why it looks like transaction log remains still?
RecoveryUnitId FileId FileSize StartOffset FSeqNo Status Parity CreateLSN 0 2 253952 8192 131161 0 64 0 0 2 253952 262144 131162 2 64 0 0 2 253952 516096 131159 0 128 0 0 2 278528 770048 131160 0 128 0
RecoveryUnitId FileId FileSize StartOffset FSeqNo Status Parity CreateLSN 0 2 253952 8192 131221 0 128 0 0 2 253952 262144 131222 0 128 0 0 2 253952 516096 131223 2 128 0 0 2 278528 770048 131224 2 128 0
For those who interested I've recorded "sqlserv.exe" activities using Process Monitor - I can see that file being overwritten again and again - looks like SQL Server treats old log items as no longer needed by some reason: https://dl.dropboxusercontent.com/u/1323651/stackoverflow-sql-server-transaction-log.pml.
UPDATE (April, 24). Seems I've finally started to understand what is going on there and want to share with you. The reasoning above is true in general, but has serious caveat that also produced confusion about strange transaction log re-usage with INSERT
s.
Database will behave like in SIMPLE recovery mode until first full backup is taken (even though it's in FULL recovery mode).
We can treat numbers and diagram above as valid for SIMPLE
recovery mode, and I have to redo my measurement for real FULL
- they are even more astonishing.
UPDATE INSERT Duration 13:20 (!) 00:02 Data file grow 8MB 11MB Tran. log grow 55.2GB (!) 14MB
You are violating one of the basic tenants of the normal form with the log field. The log field seams to be holding an appending sequence of info related to the primary. The fix is to stop doing that.
1 Create a table. xyz_event_history_LOG(event_history_id,log_sequence#,log)
2 stop doing updates to the log field in [xyz_event_history], instead do inserts to the xyz_event_history_LOG
The amount of data in your transaction log will decrease GREATLY.
The transaction log contains all the changes to a database in the order they were made, so if you update a row multiple times you will get multiple entries to that row. It does store the entire value, old and new, so you are correct that multiple small updates to a large data type such as nvarchar(max) would be inefficient, you would be better off storing the updates in separate columns if they are only small values.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With