Saturday, November 23, 2024

TempDB

 There are 3 types of objects stored in tempdb: 

 

  1. User Objects 

  1. Internal Objects 

  1. Version Stores 

 
A user object can be a temporary table, a table variable or a table returned by a table-valued function. It can also be a regular table created in the tempdb database. 

 

Internal objects are created and managed by SQL Server internally. Their data or metadata cannot be accessed. Here are some examples of internal objects in tempdb: 

  1. Query Intermediate Results for Hash Operations 

  1. Sort Intermediate Results 

  1. Contents of LOB Data Types 

  1. Query Result of a Static Cursor 

Unlike user objects, operations on internal objects in tempdb are not logged, since they do not need to be rolled back. But internal objects do consume space in tempdb. Each internal object occupies at least 9 pages (one IAM page and 8 data pages). Tempdb can grow substantially due to internal objects when queries that process large amounts of data are executed on the instance, depending on the nature of the queries. 

Version stores are used for storing row versions generated by transactions in any database on the instance. The row versions are required by features such as snapshot isolation, after triggers and online index build. Only when row versioning is required, the row versions will be stored in tempdb. 

 

There are 3 dynamic management views, which make the task of troubleshooting tempdb space usage quite easy.  

 
The views are: 

 

 
 

What does Checkpoint do on tempdb? 

When you insert a row in a table, SQL Server does not immediately write it in the data (mdf) file. It has a control to write it on the log (ldf) and keep the row in a page in memory. The pages that are in memory and were not physically written to the data file are called dirty pages. A dirty page is written to the data file by one of the three processes: checkpoint, lazy writer or eager writer. These are the very basics of I/O writing on SQL Server, and you need to understand it to be able to troubleshoot and explain complex problems related to tempdb and the whole SQL engine. 

Rows may exist only on the log file and dirty pages in memory. If an unexpected crash happens and your SQL Server stops, once it is back, SQL engine must redo any transaction that was written to the transaction log file and apply those changes to the data (mdf) files. This whole thing doesn’t make sense for tempdb, right? Since tempdb is recreated on SQL Server start, there is no need for tempdb to recover transactions from the log file into the data file, therefore, there is no need to flush dirty tempdb pages to disk. Actually, this is why SQL Server doesn’t (there are exceptions, when the log reaches 70% of space, manual checkpoint…) do automatic checkpoints for tempdb. 

There is no recovery need for tempdb as it is recreated on SQL service start, so indirect checkpoints on tempdb may not make much sense from a recovery perspective. However, the indirect checkpoint feature is still important to ensure the dirty pages in tempdb do not continue to take away buffer pool pages from user database workload as there is no automatic checkpoint to flush pages for tempdb. 

One of the reasons to enable indirect checkpoints on tempdb is to help scenarios with dirty tempdb pages take away the buffer pool data cache and cause pressure on lazy writer while it flushes dirty pages, the exact scenario we are analyzing. 

If indirect checkpoint had previously flushed dirty pages on tempdb, a query wouldn’t need to wait (SLEEP_TASK) for lazy writer to flush it. But, there is a small problem with indirect checkpoints on tempdb. Any non-logged “bulk” operation that qualifies for an “eager write” in tempdb is not a candidate to be flushed by the recovery writer (the internal thread that runs the indirect checkpoint). 

This raises an important question: which data load operation is minimally logged on tempdb? This is important to know because minimally logged operations on tempdb will not be flushed by the indirect checkpoint. The following list can be used to assist you in understanding which load operations on tempdb will be minimally logged and which will not. 

Command 

Is minimally logged? 

INSERT INTO #TMP + SELECT 

Yes, since local temporary tables are private to the creating session, there is no need to use TABLOCK to get an ML operation. 

INSERT INTO #TMP WITH(TABLOCK) + SELECT 

Yes. Note that TABLOCK is required to get + SELECT an insert operator running in parallel on SQL2016+. 

SELECT INTO #TMP 

Yes. Since tempdb DB recovery model is always set to simple, a SELECT + INTO will be ML. 

SELECT INTO tempdb.dbo.TMP 

Yes. 

INSERT INTO tempdb.dbo.TMP WITH(TABLOCK) + SELECT 

Yes. Since this is a regular user table that can be accessed by any session, it requires a TABLOCK to be ML. Same applies for a global (##) temporary table. 

INSERT INTO tempdb.dbo.TMP + SELECT 

No. So, the only way to don’t qualify for a ML operation on tempdb is to use a regular or a global (##) temporary table and don’t specify the TABLOCK. 

 

 

Capacity planning for tempdb in SQL Server 

Determining the appropriate size for tempdb in a SQL Server production environment depends on many factors. As described earlier, these factors include the existing workload and the SQL Server features that are used. We recommend that you analyze the existing workload by performing the following tasks in a SQL Server test environment: 

  • Set autogrow on for tempdb. 

  • Run individual queries or workload trace files and monitor tempdb space use. 

  • Execute index maintenance operations such as rebuilding indexes, and monitor tempdb space. 

  • Use the space-use values from the previous steps to predict your total workload usage. Adjust this value for projected concurrent activity, and then set the size of tempdb accordingly. 

 

-- Determining the amount of free space in tempdb 

SELECT SUM(unallocated_extent_page_count) AS [free pages], 

  (SUM(unallocated_extent_page_count)*1.0/128) AS [free space in MB] 

FROM tempdb.sys.dm_db_file_space_usage; 

 

Optimize tempdb performance in SQL Server 

The size and physical placement of the tempdb database can affect the performance of a system. For example, if the size that's defined for tempdb is too small, part of the system-processing load might be taken up with autogrowing tempdb to the size required to support the workload every time you restart the instance of SQL Server. 

If possible, use instant file initialization to improve the performance of growth operations for data files. 

Preallocate space for all tempdb files by setting the file size to a value large enough to accommodate the typical workload in the environment. Preallocation prevents tempdb from expanding too often, which affects performance. The tempdb database should be set to autogrow to increase disk space for unplanned exceptions. 

Data files should be of equal size within each filegroup, because SQL Server uses a proportional-fill algorithm that favors allocations in files with more free space. Dividing tempdb into multiple data files of equal size provides a high degree of parallel efficiency in operations that use tempdb. 

Set the file growth increment to a reasonable size and set it to the same increment in all data files, to prevent the tempdb database files from growing by too small a value. If the file growth is too small compared to the amount of data that's being written to tempdb, tempdb might have to constantly expand. That will affect performance. 

 

Contentions Cause 

When the tempdb database is heavily used, SQL Server may experience contention when it tries to allocate pages. Depending on the degree of contention, this may cause queries and requests that involve tempdb to be briefly unresponsive. 

During object creation, two (2) pages must be allocated from a mixed extent and assigned to the new object. One page is for the Index Allocation Map (IAM), and the second is for the first page for the object. SQL Server tracks mixed extents by using the Shared Global Allocation Map (SGAM) page. Each SGAM page tracks about 4 gigabytes of data. 

To allocate a page from the mixed extent, SQL Server must scan the Page Free Space (PFS) page to determine which mixed page is free to be allocated. The PFS page keeps track of free space available on every page, and each PFS page tracks about 8000 pages. Appropriate synchronization is maintained to make changes to the PFS and SGAM pages; and that can stall other modifiers for short periods. 

When SQL Server searches for a mixed page to allocate, it always starts the scan on the same file and SGAM page. This causes intense contention on the SGAM page when several mixed-page allocations are underway.  

No comments:

Post a Comment