Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server Latches and their indication of performance issues

I am trying to understand a potential performance issue with our database (SQL 2008) and in particular one performance counter, SQLServer:Latches\Total Latch Wait Time Total Latch Wait Time (ms). We are seeing a slow down in DB response times and the only correlating spike that I can match it with is a spike in Total Latch Wait Time and Latch Waits/sec. I am not seeing any particular bottleneck in disk IO, CPU usage or memory.

The common explanation of a SQLServer latch is that it is a lightweight lock, but I am trying to get a more detailed understanding of what a latch is, how it differs from a lock and what the high amount of them that I am seeing may be an indicator for.

like image 856
John Lemp Avatar asked Dec 14 '09 21:12

John Lemp


People also ask

How will you diagnose contention for latches?

The primary tools used to diagnose latch contention are: Performance Monitor to monitor CPU utilization and wait times within SQL Server and establish whether there is a relationship between CPU utilization and latch wait times.

What is the most effective way to reduce latch contention?

To resolve this contention, the overall strategy is to prevent all concurrent INSERT operations from accessing the same database page. Instead, make each INSERT operation access a different page and increase concurrency.

What causes latch wait in SQL Server?

A latch wait is a delay associated with the latch, and is often caused by the I/O system not keeping up with requests so it is taking a long time to get pages from disk into memory. Buffer latch contention is one common reason for long latch waits.


2 Answers

This maybe a really basic error to professional DBA... but this is what I found with our high latch problem, and this thread ranks very high in search results. I thought I'd share our bit that it may help someone else.

on newer dual / multi processor server using NUMA memory architecture, the max degree of parallelism should be set to the actual core number per processor. in our example we had dual xenon with 4 cores each, and with hyper threading it appears as 16 logical processors to SQL.

Locking this value from the default 0 to 4 cut the high latch on some queries down immediately.

Our latch ran 1000ms+ up to 30,000ms on some occasions.

like image 150
michael x Avatar answered Oct 21 '22 22:10

michael x


I recommend you looke into sys.dm_os_latch_stats and see what type of latches have increased contention and wait types, compared to previous base-line.

If you see a spike in the BUFFER type latches it means it is driven by updates conflicting to modify the same page. Other latch types have also short explanation in the MSDN and can guide you toward the problem root cause. For those marked 'internal use only', you're going to have to open a support case with MS, as a detailed explanation of what they mean is on the verge of NDA.

You should also look into sys.dm_os_wait_stats. If you see an increase of PAGELATCH_*, then it is the same problem as the BUFFER type latch above, contention in trying to modify same page, aka. as an update hot-spot. If you see an increase PAGEIOLATCH_*then your problem is the I/O susbsytem, it takes too long to load the pages in memory when they are needed.

like image 26
Remus Rusanu Avatar answered Oct 21 '22 23:10

Remus Rusanu