Knowing SQL Server: Latches

A latch is an internal object, used by the SQL Server engine. It is not something that you, the database developer, can directly influence. If you need to get data from a particular page, the SQL Server engine needs to acquire a latch. You have no choice over this. Nor can you tell it what kind of latch to acquire — that’s determined by the SQL Server engine. The difference is that this is not just about the protection of data, it’s about the protection of server memory. Although you might be willing to tolerate dirty reads, and choose your locking strategy accordingly, you don’t have that luxury with latches. Spinlocks are a similar concept to latches, in that they are also lightweight synchronization primitives, but they act slightly differently. A lot of the effects can seem similar, and the kinds of systems that can exhibit spinlock contention are similar to those that can exhibit latch contention.

The main difference between a spinlock and a latch is this: If a thread fails to acquire a latch immediately, it yields, enabling the CPU to be used for other things. If a thread fails to acquire a spinlock, the thread starts looping (spinning), checking the resource repeatedly, with the expectation that it will become available soon. It won’t spin forever, though. After a bit of time, it will back off, at which point it yields to other processes on the CPU. Because we have no control over latching behavior or spinlocks, it isn’t possible to demonstrate the impact of various latch scenarios using a real system.

LATCH TYPES latches exist to protect in-memory data. There are hundreds of different types of latch, most of which you are unlikely to encounter in any meaningful way when you are working with SQL Server. While latch waits will occasionally show up in sys.dm_os_wait_ stats, you normally have to actively search for them.

Typically, latches are divided into two distinct categories in SQL Server. They either serve the buffer pool, in which case they are known as BUF latches (showing up as PAGELATCH or PAGEIOLATCH in sys.dm_os_wait_stats and aggregated into the BUFFER latch class in sys.dm_os_latch_stats), or they don’t, in which case they are grouped under the non-buffer (Non-BUF) heading.

The PAGEIOLATCH_ latch types are used when data is being moved from disk into RAM. An I/O operation is in play when a I/O latch is needed. In some ways, this is the easiest type latch wait to troubleshoot, as high PAGEIOLATCH wait times imply that the I/O subsystem cannot keep up. If this is the case, and you can’t mitigate the problem through I/O reduction or increased RAM, you have a nice argument for buying that faster storage you’ve been wanting.

LATCH MODES

If you query sys.dm_os_wait_stats as follows, you’ll see the different modes listed there.

SELECT * FROM sys.dm_os_wait_stats

where wait_type like 'PAGELATCH%';

Six latch modes are listed, usually in the following order: NL, KP, SH, UP, EX, DT.

NL NL is an internal Null latch. You don’t need to consider it. It essentially means no latch is being used, so it isn’t even recorded under normal conditions.

KP KP is a Keep latch, used to indicate that a particular page is needed for something and shouldn’t be destroyed. SH This refers to a Shared latch, which is needed to read the data from a page.

UP This is an Update latch, which indicates that a page is being updated, but not the table data within it. This is not related to the T-SQL UPDATE statement, which requires an Exclusive latch (the next mode discussed). Update latches are more common for internal operations, such as maintaining PFS pages or updating the checksum bits on a page. Because the type of data being updated is not needed to service queries, it is compatible with a shared latch, but not another Update latch.

EX When data is being explicitly changed or added, an Exclusive latch is required. This is the most common type of latch for troubleshooting purposes, as two EX latches cannot be held on the same page at the same time. While this is also true of UP latches, EX latches are the more common of the two.

DT The presence of this latch, the Destroy latch, means that the page is in the process of being removed from memory. A page that is deleted picks up a DT latch from the lazywriter process while the record of the page is removed. Bear in mind that this does not necessarily mean that the data is being deleted — it may simply be removed from the buffer cache, with a copy of the data still residing on the disk.

Grant Order

In any system, particularly as the number of processor threads grows, a number of requests will be queued for a particular page. For example, a number of pages might be inserting data into a table while others are reading that data, and the data may need to be moved from disk, and so on. For a page that has no latches on it, the first process that wants a latch will be granted one. That’s straightforward; but when more processes start coming along, the behavior is slightly different. A KP latch will skip the queue completely — unless there is a DT latch on the page, a KP latch will jump ahead and keep it alive.

Other latches will wait, joining the queue (even if there is compatibility between the two — another slight difference between lock behavior and latch behavior). When the current latch is released, the first latch in the queue can be granted, but here something special happens. Any other latch in the queue that is compatible with that first latch (which is being granted) will be allowed, even if there are incompatible locks in front of it. It’s like the nightclub bouncer who takes the first person in the queue but also looks through it for anyone else who can be let in. This way, the next latch type in line is always granted, but there’s an opportunity for other latches to jump in through the closing door at the same time. Typically, latches are taken out for short periods, so the incompatible latches shouldn’t have to wait for too long, depending on what’s going on. The algorithm might not seem fair, but it does make sure that concurrency can apply when possible.

sys.dm_os_wait_stats

sys.dm_os_latch_stats

sys.dm_os_spinlock_stats

SELECT * FROM sys.dm_os_performance_counters

WHERE object_name LIKE '%Latches%'

UP Latches

in tempdb It is possible that the resource your request is waiting on might be in tempdb, rather than the database you have designed. You can see this by looking at the wait_resource field in sys.dm_ exec_requests and, in particular, the first number, which indicates the database. The number 2 means that tempdb has the problem.

If PAGELATCH_UP waits are seen on the first page in any of the files in tempdb — that is, page 2:1:1 or 2:4:1 (essentially, 2:N:1 for any N) — then this indicates that the PFS (Page Free Space) page is exhibiting latch contention. This can be confirmed by looking at sys.dm_os_buffer_descriptors:

SELECT page_type FROM sys.dm_os_buffer_descriptors

WHERE database_id = 2 AND page_id = 1;

A common reaction to any kind of contention in tempdb is to increase the number of data files it uses. It is good practice to have more than one tempdb data file in a multi-threaded environment.

The PFS_PAGE resource must be updated whenever data is inserted into a table without a clustered index to locate a page with enough free space for the insert.

One common cause of this type of contention is the use of multi-statement table-valued functions. Like a scalar function, a multi-statement table-valued function is executed in a separate context. The tempdb database is used to store the results of multi-statement table-valued functions, and it is here that contention could occur. The storage used by tempdb for the results of a multi-statement table-valued function must be managed, which involves the PFS_PAGE resource (using UP latches, because the information being updated is not table data, which would require an EX latch), as it determines where new records can be placed, and it marks them as free once the results have been consumed by the outer query.

Knowing SQL Server

Saturday, December 7, 2024

Latches

No comments:

Post a Comment

Blog Archive