Latches are lightweight synchronization primitives that are used by the SQL Server engine to guarantee consistency of in-memory structures including; index, data pages, and internal structures, such as non-leaf pages in a B-Tree. SQL Server uses buffer latches to protect pages in the buffer pool and I/O latches to protect pages not yet loaded into the buffer pool. Whenever data is written to or read from a page in the SQL Server buffer pool a worker thread must first acquire a buffer latch for the page. There are various buffer latch types available for accessing pages in the buffer pool including exclusive latch (PAGELATCH_EX) and shared latch (PAGELATCH_SH). When SQL Server attempts to access a page that is not already present in the buffer pool, an asynchronous I/O is posted to load the page into the buffer pool. If SQL Server needs to wait for the I/O subsystem to respond it will wait on an exclusive (PAGEIOLATCH_EX) or shared (PAGEIOLATCH_SH) I/O latch depending on the type of request; this is done to prevent another worker thread from loading the same page into the buffer pool with an incompatible latch. Latches are also used to protect access to internal memory structures other than buffer pool pages; these are known as Non-Buffer latches.
Contention on page latches is the most common scenario encountered on multi-CPU systems and so most of this article will focus on these. Latch contention occurs when multiple threads concurrently attempt to acquire incompatible latches to the same in-memory structure. As a latch is an internal control mechanism; the SQL engine automatically determines when to use them. Because the behavior of latches is deterministic, application decisions including schema design can affect this behavior.
How does SQL Server use latches?
A page in SQL Server is 8 KB and can store multiple rows. To increase concurrency and performance, buffer latches are held only for the duration of the physical operation on the page, unlike locks, which are held for the duration of the logical transaction.
Latches are internal to the SQL engine and are used to provide memory consistency, whereas locks are used by SQL Server to provide logical transactional consistency.
SQL Server latch modes and compatibility
Latch mode | KP | SH | UP | EX | DT |
KP | Y | Y | Y | Y | N |
SH | Y | Y | Y | N | N |
UP | Y | Y | N | N | N |
EX | Y | N | N | N | N |
DT | N | N | N | N | N |
Latch wait types
Cumulative wait information is tracked by SQL Server and can be accessed using the Dynamic Management View (DMW) sys.dm_os_wait_stats. SQL Server employs three latch wait types as defined by the corresponding wait_type in the sys.dm_os_wait_stats DMV:
Buffer (BUF) latch: used to guarantee consistency of index and data pages for user objects. They are also used to protect access to data pages that SQL Server uses for system objects. For example, pages that manage allocations are protected by buffer latches. These include the Page Free Space (PFS), Global Allocation Map (GAM), Shared Global Allocation Map (SGAM) and Index Allocation Map (IAM) pages. Buffer latches are reported in sys.dm_os_wait_stats with a wait_type of PAGELATCH_*.
Non-buffer (Non-BUF) latch: used to guarantee consistency of any in-memory structures other than buffer pool pages. Any waits for non-buffer latches will be reported as a wait_type of LATCH_*.
IO latch: a subset of buffer latches that guarantee consistency of the same structures protected by buffer latches when these structures require loading into the buffer pool with an I/O operation. IO latches prevent another thread loading the same page into the buffer pool with an incompatible latch. Associated with a wait_type of PAGEIOLATCH_*.
Note
If you see significant PAGEIOLATCH waits, it means that SQL Server is waiting on the I/O subsystem. While a certain amount of PAGEIOLATCH waits is expected and normal behavior, if the average PAGEIOLATCH wait times are consistently above 10 milliseconds (ms) you should investigate why the I/O subsystem is under pressure.
Factors affecting latch contention
Latch contention that hinders performance in OLTP environments is typically caused by high concurrency related to one or more of the following factors:
actor | Details |
High number of logical CPUs used by SQL Server | Latch contention can occur on any multi-core system. In SQLCAT experience excessive latch contention, which impacts application performance beyond acceptable levels, has most commonly been observed on systems with 16+ CPU cores and may increase as additional cores are made available. |
Schema design and access patterns | Depth of B-tree, clustered and non-clustered index design, size and density of rows per page, and access patterns (read/write/delete activity) are factors that can contribute to excessive page latch contention. |
High degree of concurrency at the application level | Excessive page latch contention typically occurs in conjunction with a high level of concurrent requests from the application tier. There are certain programming practices that can also introduce a high number of requests for a specific page. |
Layout of logical files used by SQL Server databases | Logical file layout can affect the level of page latch contention caused by allocation structures such as Page Free Space (PFS), Global Allocation Map (GAM), Shared Global Allocation Map (SGAM) and Index Allocation Map (IAM) pages. For more information, see TempDB Monitoring and Troubleshooting: Allocation Bottleneck. |
I/O subsystem performance | Significant PAGEIOLATCH waits indicate SQL Server is waiting on the I/O subsystem. |
Indicators of latch contention
Latch contention is only problematic when the contention and wait time associated with acquiring page latches prevents throughput from increasing when CPU resources are available. Determining an acceptable amount of contention requires a holistic approach that considers performance and throughput requirements together with available I/O and CPU resources.
The following measures of latch wait time are indicators that excessive latch contention is affecting application performance:
Average page latch wait time consistently increases with throughput: If average page latch wait times consistently increase with throughput and if average buffer latch wait times also increase above expected disk response times, you should examine current waiting tasks using the sys.dm_os_waiting_tasks DMV.
Percentage of total wait time spent on latch wait types during peak load: If the average latch wait time as a percentage of overall wait time increases in line with application load, then latch contention may be affecting performance and should be investigated.
Measure page latch waits and non-page latch waits with the SQLServer:Wait Statistics Object performance counters. Then compare the values for these performance counters to performance counters associated with CPU, I/O, memory, and network throughput. For example, transactions/sec and batch requests/sec are two good measures of resource utilization.
CPU Utilization does not increase as application workload increases: If the CPU utilization on the system does not increase as concurrency driven by application throughput increases, this is an indicator that SQL Server is waiting on something and symptomatic of latch contention.
No comments:
Post a Comment