Saturday, December 7, 2024

Latch Contention

A latch is like a lock on a piece of memory. As more threads get involved, they will start to compete to access the same pieces of memory, causing blocking. Blocking due to latch contention is exhibited in waits; but unlike a lock, a latch can be released as soon as the physical operation is completed.  

The main sources of data about latches are two DMVs called sys.dm_os_wait_stats and sys.dm_os_latch_stats. The DMVs are restarted when the service restarts, or when the DBCC SQLPERF command is called to clear them:  

 

DBCC SQLPERF('sys.dm_os_wait_stats', CLEAR);  

DBCC SQLPERF('sys.dm_os_latch_stats', CLEAR);  

 

Among the columns of the DMV sys.dm_os_wait_stats are ones called wait_type, wait_time_ ms and waiting_tasks_count. These three columns represent the wait type, the total number of milliseconds that have been spent waiting on this wait type, and the number of times this type of wait has occurred, respectively. Wait types that associate with latches start with LATCH_, PAGELATCH_, or PAGEIOLATCH_. Dividing the total wait time by the number of waits will give an average wait time. 

 

Measuring Spinlock Contention  

For the time being, you can think of a spinlock as a latch, except that if the memory needing access is not available, the spinlock will keep checking it (known as spinning) for a while. The main DMV for spinlocks is sys.dm_os_spinlock_stats. The metrics of concern for spinlocks are around collisions and spins_per_collision, which are both columns in this DMV, along with the name column for the type of spinlock. 

 

Contention Indicators  

Locks are used to maintain the ACID properties of a transaction, but latches are used to provide consistency over a memory operation. A spinlock should be considered as similar, in that a resource is needed for a particular operation, but is not available. Therefore, when you see higher than normal wait stats associated with latches and spinlocks, you may need to investigate further. However, only seeing higher than normal wait stats does not in and of itself indicate a contention problem. It is perfectly reasonable to expect that if your system is busier now than when you took your benchmark, your wait stats would be higher. You should look for the proportion of wait stats compared to others, and compare this to the throughput being achieved. You might also want to consider how many transactions per second you’re managing to achieve as you increase the load, and the effect of extra processors.  

One strong indicator of latch contention can be seen when the proportion of latch-related wait stats increases unreasonably as your throughput increases. For spinlock contention, a strong indicator is the number of spins per collision increasing, typically combined with increased CPU. Spinning is an activity that requires CPU effort, so if spinning increases disproportionally, CPU is likely to do the same. CPU may go up simply because of a busier system, but if the transactions per second counters are leveling off while CPU is increasing, then this would suggest the CPU is being used for something else — such as spinning. 

 

In SQL Server, you lock something to use it. A latch is similarly applied to a piece of memory when it is used. Contention occurs when a process tries to use a piece of memory and finds that another process has already acquired a latch on it. If SQL Server were using only one processor core, there shouldn’t be a problem (you can sit where you like if you’re the only person at home Increasingly, we’re seeing systems with a number of cores that seem ridiculous; and much like we used to talk about memory in megabytes rather than gigabytes, the numbers available now will seem tiny in the future. It is already common to see six- or eight-core processors. 

Furthermore, there are systems that encourage the use of a particular piece of memory. For example, when a lot of processes are trying to push data into a table, and that table always inserts data into the same page, latch contention could result. Database administrators don’t always know exactly what the applications that use their databases are trying to do, but they often have a good idea of which tables meet these conditions, and therefore whether they might be candidates for latch contention. 

No comments:

Post a Comment