Saturday, November 23, 2024

Troubleshooting performance issues - 2

 Common Causes of High CPU Usage 

Regardless of the size and expense of the hardware and technology that underpins your SQL Server installations, there is always the risk that one or more poorly tuned T-SQL statements will cause severe overutilization of resources. 

For every statement sent to SQL Server for execution, the query optimizer attempts to find the most efficient way to retrieve the data, i.e. the one that is the least expensive in terms of use of CPU, I/O and memory resources. The plan it produces will only be as good as the data access paths that are available to it, and the information it has regarding the data and its distribution. If appropriate indexes are missing, or queries are written in such a way that potentially useful indexes are ignored, then the optimizer will not be able to come up with a truly optimal plan.  

 

Likewise, if the information the optimizer has regarding the data, via index statistics, is inaccurate, then the optimizer may select a sub-optimal plan, since the information that it is using to calculate the cost of plans is inaccurate. 

Another possibility is that the optimizer will produce a plan that is optimal for one execution of the query (typically the one that triggered the optimization) and not for others. This is commonly known as "parameter sniffing," though it should more 

accurately be referred to as "inappropriate parameter sniffing," since parameter sniffing is in general a good thing. 

 

Outdated statistics 

The SQL Server Optimizer uses statistics to calculate the estimated cardinality for various query operators. That cardinality, essentially number of rows, affects the cost of the operators. The cost of the operators, in turn, determines the cost of the plan. If the cardinality estimation is wrong, because of outdated statistics, the cost that the optimizer calculates for the operators will also be wrong, leading the optimizer to select a plan that has a low estimated cost, but a very high actual cost when it is executed. 

 

The most common side effect of incorrect statistics is that the optimizer estimates on the low-side for the number of rows, and so chooses operators that are very good for small numbers of rows, such as nested loop joins and key lookups. When the query is executed and it turns out that a large number of rows need to be processed, the chosen operators scale badly and the plan is highly inefficient. 

One way to tell if there is a problem with statistics for a particular query, is to run the query in Management Studio, return the actual execution plan and examine the estimated and actual row counts for any index seek and scan operations within the 

execution plan. If the two counts are significantly different, bearing in mind that the estimated count is per execution of the operator and the actual count is a total for all executions of the operator, then one possibility is that the statistics are outdated. 

 

 

Non-SARGable predicates 

 

SARGable, where SARG stands for Search Argument, it simply means that that a predicate can be used in an index seek operation. The rules for SARGable predicates, in general, are that the column should be directly compared (equality or inequality) to an expression, and that any functions specified on the column will make the predicate non-SARGable. In other words, WHERE SomeFunction(Column) = @Value is not SARGable, whereas WHERE Column = SomeOtherFunction(@Value) is SARGable.  

Note that SARGability doesn't rule out the use of operators such as LIKE or BETWEEN (both inequality comparisons) or IN (treated as a set of equality comparisons). 

Non-SARGable predicates can result in table or index scans and, similar to the case of 

missing indexes, this will cause significant CPU usage as SQL has to read and process 

far more rows than necessary. 

This is a fairly common problem in many databases; I often see functions such as UPPER, LTRIM, ISNULL being used in queries, either in the joins or in the WHERE clause, and in many cases there is simply no need for them. If the columns use a caseinsensitive collation, then uppercase and lowercase values are considered equal, and the use of the UPPER or LOWER functions do nothing other than degrade performance. 

Similarly, with string comparisons SQL ignores trailing spaces, removing the need for 

the RTRIM function. 

 

Dealing with NULLs is always a fun one. The ISNULL function is often used unnecessarily due to a misunderstanding of how NULLs work in predicates. For example, the following two WHERE clause predicates are completely equivalent in function. 

 

WHERE ISNULL(SomeCol,0) > 0 

WHERE SomeCol > 0 

 

In the first one, any row with a NULL value will be excluded because the NULL is converted to zero and the filter is for values greater than zero. In the second one, any row with a NULL value will be excluded because NULLs do not ever return true when compared to any value using the =, <>, <, > or any of the other comparison operators. They can only return true for IS NULL or IS NOT NULL checks. Hence, both predicates achieve the same result, but only the second one allows use of index seeks. 

 

 

Implicit conversions 

An implicit conversion results from a comparison of unequal data types. SQL cannot compare values of differing types and it must convert one of the columns involved to the same data type as the other, in order to do the comparison. 

When an implicit conversion occurs on a column that is used in a WHERE or FROM clause, the SQL Server Optimizer dictates a conversion of all the column values before the filter can be applied. This means that, during the query execution, the query processor will convert the lower precedence data type to the higher precedence data type before applying the filter or join condition. This means that, as with the case of functions on the column, the predicate is considered non-SARGable and so index seeks cannot be used, SQL must process more rows than necessary to get the results, and this leads to higher CPU usage. 

 

A common manifestation of this problem is the comparison of NVARCHAR parameters to 

columns that are of type VARCHAR. 

 

SELECT p.FirstName , 

p.LastName , 

c.AccountNumber 

FROM Sales.Customer AS c 

INNER JOIN Person.Person AS p ON c.PersonID = p.BusinessEntityID 

WHERE AccountNumber = N'AW00029594' 

 

 

Parameter sniffing 

Parameter sniffing is a process used by SQL Server when creating an execution plan for a stored procedure, function, or parameterized query. The first time the plan is compiled, SQL Server will examine, or "sniff", the input parameter values supplied, and use them, in conjunction with the column statistics, to estimate the number of rows that will be touched by the query. It then uses that estimate in its costing of various possible execution plans.  

A problem only arises if the values that were passed as input parameters on initial plan creation, result in a row count that is atypical of that which will result from future executions of the procedure. Parameter sniffing only occurs at the time a plan is compiled or recompiled, and all subsequent executions of the stored procedure, function, or parameterized query will use the same plan. 

 

During the initial compile, only the values of the input parameters can be sniffed as any local variables will have no value. If a statement within the batch is recompiled, both parameter and variable values can be sniffed, as the variables will, by that time, have values. 

 

Trace Flag 4136 

SQL Server 2008 introduced an option to turn parameter sniffing off altogether for a SQL Server instance, by simply enabling Trace Flag 4136. This option was added in SQL Server 2008 SP1 CU7, and SQL Server 2008 R2 CU2, and also back-ported into SQL Server 2005 in SP3 CU9. 

When the query optimizer is able to "sniff" the value of a parameter, it uses this value, along with the statistics histogram, to provide an accurate estimate of the number of records that will be returned. As discussed earlier, this is only problematic if the initial parameter value turns out to be completely atypical. 

When parameter sniffing is prevented, the optimizer can't find out the parameter value and so can't use the statistics histogram. Instead, it makes what is often a less accurate estimation of the number of rows that will be returned, by assuming a uniform data distribution across all data values. 

In short, although this option is available, parameter sniffing is beneficial to most procedures that are written to use typical values. Turning parameter sniffing off may inadvertently affect these plans in a negative way. As such, this Trace Flag should be considered an absolute last resort if nothing else fixes the problem. 

 

 

Inappropriate parallelism 

SQL Server is designed to be able to make use of multiple processors when processing user requests. Query parallelism is the mechanism used by the SQL query execution engine to split the work of a query into multiple threads, each of which will execute on a separate scheduler. Queries are parallelized at the operator level; in other words, if the query runs in parallel, some of the query operators may run in their parallel form, while others may not. 

 

When a query is submitted to SQL Server for processing, the query optimizer compiles an execution plan that has been optimized to allow the query to execute in the fastest manner possible. If the estimated cost of executing the plan serially exceeds the 'cost threshold for parallelism' sp_configure option, the number of logical CPUs available to SQL Server is greater than one, and the 'max degree of parallelism' 

sp_configure option is set to the default of zero or greater than one, the plan produced will include parallelism. The Degree of Parallelism (DOP) is not included as a part of the plan; this is, instead, determined at the time of execution based on the number of logical processors, the 'max degree of parallelism' sp_configure or, if the MAXDOP query hint is being used, the value specified by the hint, and the number of available worker threads. 

Parallel query processing can reduce the time required to process a query by horizontally partitioning the input data, distributing the partitions across multiple logical CPUs, and simultaneously executing the same operation across multiple processor cores. 

This can be very beneficial to data warehouse and reporting operations, which have a few large queries that deal with volumes of data and only a few requests occur concurrently. By splitting the request across multiple OS threads on multiple processor cores, the optimizer increases the utilization of the hardware resources by spreading the load across all of the processors on the server, resulting in a reduction of total execution time. 

 

 

Max degree of parallelism 

One of the more common online recommendations is to disable parallelism entirely by setting the max degree of parallelism to 1. There are cases where this configuration 

might make sense, for example, true OLTP workloads where all of the transactions are 

small and there are a lot of transactions executing concurrently. These types of database rarely exist today and disabling parallelism entirely is more likely to reduce performance in the long term. 

Over the years I've made a number of different recommendations about how to configure max degree of parallelism. For example, in a SMP system, setting it to half the number of available physical processor cores, or to the number of physical cores on a single processor die, or even setting it to 1 to disable parallelism entirely. Today, I only make a recommendation based on analysis of the query workload, and a review of the wait types of associated workers and subtasks which are executing using parallelism. 

In particular, I'll analyze occurrences of the CXPACKET session wait type. In most systems, CXPACKET is the symptom and not the problem; there is often a different underlying wait type that can be seen in sys.dm_os_waiting_tasks for the session. By focusing on this wait type, a better decision regarding the appropriate max degree of parallelism option can be made.  

For example, if the underlying wait type, is PAGEIOLATCH_SH then the parallel operation is waiting on a read from the disk I/O subsystem, and reducing the max degree of parallelism won't resolve the root problem; it will just reduce the number of workers being used in the system, and reduce the accumulated wait time for the CXPACKET wait type. However, this may reduce the additional load the parallelism operations place on the disk I/O subsystem, and buy you time to scale up the I/O performance of the server. 

 

 

Diagnosing inappropriate parallelism 

The best way to determine if parallel processing is causing a resource bottleneck in a specific system is to look at the wait statistics and latch statistics for an instance of SQL Server. When a bottleneck exists during the parallel execution of a query, the CXPACKET wait type shows up as one of the top waits for SQL Server. This wait type is set whenever a parallel process has to wait in the exchange iterator for another worker to continue processing.  

Since multiple workers are forced to wait when this occurs, the volume of CXPACKET waits will generally exceed the underlying root wait type being exhibited. There are scenarios where it may not be possible to eliminate the underlying wait type; for example, when the disk I/O subsystem can't keep up with the demand required by the parallel execution of a query, the root wait type may be an IO_COMPLETION, ASYNC_IO_COMPLETION, or PAGEIOLATCH_* wait type, and scaling out the I/O subsystem is not possible.  

When this occurs, reducing the level of parallelism to a degree that still allows parallel processing to occur without bottlenecking in the disk I/O subsystem can improve overall system performance. It is possible that CXPACKET waits in conjunction with other wait types, for example LATCH_* and SOS_SCHEDULER_YIELD, do show that parallelism is the actual problem, and further investigation of the latch stats on the system will validate if this is actually the case.  

The sys.dm_os_latch_stats DMV contains information about the specific latch waits that have occurred in the instance, and if one of the top latch waits is ACCESS_METHODS_DATASET_PARENT, in conjunction with CXPACKET, LATCH_*, and SOS_SCHEDULER_YIELD wait types as the top waits, the level of parallelism on the system is the cause of bottlenecking during query execution, and reducing the 'max degree of parallelism' sp_configure option may be required to resolve the problems. 

 

 

 

Memory Management 

 

Min and max server memory 

SQL Server offers two instance-level settings that can be used to control how memory is allocated to, and removed from, the buffer pool: the min server memory and max server memory sp_configure options. Note that in SQL Server 2000, 2005, 2008 and 2008 R2, these settings apply only to the buffer pool size and do not include the memory that may be allocated by SQL Server outside of the buffer pool. 

The min server memory option specifies the minimum size to which SQL Server can 

shrink the buffer pool when under memory pressure; it does not specify the minimum amount of memory that SQL Server will initially allocate. During memory ramp up, the memory usage of an instance slowly increases and the buffer pool is grown to meet the needs of the requests being executed. The max server memory option specifies the maximum amount of memory that SQL Server can use for the buffer pool, which is primarily used for caching data pages in memory. 

 

 

Lock pages in memory 

While the SQL Server process is designed to be self-tuning with regard to its own memory allocation needs, it can't account for memory demands made by other processes 

running on the server, or by operations such as file copies. Its self-tuning nature means that it will respond to memory pressure, as and when signaled by the operating system. 

This means that, ultimately, the operating system is in control of the response to memory pressure, and one of the ways that the OS responds is by trimming, and possibly paging out, the working set of processes that are consuming memory, including, of course, the SQL Server process. When this occurs, memory allocations that are backed by physical memory are written to virtual memory in the system page file on disk. If the process's working set is relatively small, this is not generally a big problem, but on 64-bit SQL Servers with large amounts of RAM installed, where all memory is committed using VAS only and so is pageable, this can significantly impact performance. 

If SQL Server experiences a working set trim, or memory gets paged out, SQL will  

write a message in the error log as follows: 

 

A significant part of sql server process memory has been paged out. This may result in a performance degradation. Duration: 0 seconds. Working set (KB): 16484, committed (KB): 6239692, memory utilization: 0%. 

 

First, Windows would ask SQL Server to trim its working set, which would prompt 

de-allocation of memory down to the min server memory setting. If SQL Server didn't trim enough memory, or didn't trim fast enough, the OS could force part of SQL Server's 

memory allocation out of physical memory and into the paging file. At this point, the 

buffer cache is still allocated, but it is not backed by physical memory, it is backed by page file on disk which is very slow and causes the performance issues. 

To prevent this from happening on 64-bit SQL Servers, the Lock Pages in Memory 

privilege can be assigned to the SQL Server service account, in which case memory is 

allocated to the buffer pool using the Win32 function AllocateUserPhysicalPages, which is provided by the AWE API. 

 

 

Diagnosing Memory Pressure 

If SQL Server is forced to operate with an insufficient amount of memory, it will be able to store fewer data pages in the buffer pool. As a result, higher physical I/O is needed to bring data pages from disk into the buffer pool as they are requested, and performance will degrade. These data pages may soon be subsequently flushed from the buffer pool, to make room for other pages, only to be then read back into the buffer pool, in a repetitive cycle known as buffer pool churn 

The heightened disk I/O observed during buffer pool churn often means that it's easily mistaken for a disk I/O subsystem problem, rather than a memory pressure problem. 

If a system consistently performs read operations that require more memory than is available to SQL Server, each read of the data will result in flushing the buffer cache and cause physical I/O against the disk subsystem. In addition, if the workload requires significant amounts of data in sort or aggregate operations during query processing, the data can spill over to tempdb worktables due to a lack of available memory to accommodate that processing, again further increasing the I/O demand on the system. 

 

Free Pages 

The Free Pages counter reflects the total number of free pages that exist for the SQL Server buffer pool, allowing for immediate allocations by an executing request without having to release additional pages from cache to satisfy the request. The number of free pages in the system should never reach zero. This counter should be monitored in conjunction with the Page Life Expectancy counter and the Free List Stalls 

counter, to gauge whether the system is actually under memory pressure. If the PLE experiences drops in value that correlate to low or zero values for Free Pages, and the system is also experiencing Free List Stalls at the same time, then these are a sure sign that the instance is experiencing memory pressure and could benefit from having addition memory allocated to the buffer pool. 

 

Free List Stalls/sec 

Free List Stalls occur whenever a request has to wait for a free page in the buffer pool. If the number of stalls exceeds zero frequently or consistently over a period of time, this is a sign of memory pressure. 

 

Lazy Writes/sec 

The Lazy Writes/sec counter reflects the number of buffer pages that have been flushed by the Lazy Writer process, outside of a normal checkpoint operation, allowing the buffer to be reused for other pages. If you observe Lazy Writes occurring in conjunction with a low PLE, a low number of free pages, and the occurrence of Free List Stalls, this is a sign that the workload is exceeding the amount of memory that is available 

to the buffer pool, and additional memory needs to be added to the server. 

 

SQL Server:Memory Manager 

The counters relating to the SQL Server:Memory Manager object provide useful insight into overall memory consumption and memory management issues on the server. Total Server Memory (KB) and Target Server Memory (KB) Respectively, these counters represent the total amount of memory that has been allocated by SQL Server and the amount of memory that SQL Server wants to commit. 

When the Target Server Memory (KB) counter exceeds the Total Server Memory (KB) counter, the SQL Server process wants to commit more memory than is available on the server, which can be a sign of memory pressure. Generally, SQL Server will reduce its memory demands to match the available memory on the server, or what is specified by the 'max server memory' option of sp_configure, so these two counters are not the first places to start, when looking to confirm a memory pressure issue on SQL Server. 

 

Memory Grants Outstanding 

This counter measures the total number of processes that have successfully acquired a workspace memory grant. Low values for this counter, under periods of high user activity or heavy workload, may be a sign of memory pressure, especially if there are a high number of memory grants pending. 

 

Memory Grants Pending 

This counter measures the total number of processes that are waiting for a workspace memory grant. If this value is non-zero, it is a sign that tuning or optimization of the workload should be performed if possible, or that additional memory needs to be added to the server. 

 

 

Memory-related DMVs 

There is also some information regarding memory-related waits and non-buffer pool memory allocations and so on, that can be extracted from the DMVs, such as the sys.dm_os_memory_* objects in the Operating System-related DMVs, or the sys.dm_exec_query_memory_grants DMV. For example: 

 

sys.dm_exec_query_memory_grants can be used to find queries that are waiting (or have recently had to wait) for a memory grant, especially those requesting relatively large memory grants. 

sys.dm_os_memory_cache_counters provides a snapshot of current usage of the memory cache. Includes the multi_pages_kb column showing the amount of memory allocated by the multiple-page allocator. 

sys.dm_os_sys_memory summarizes the overall memory condition of the system, including current levels of memory in the system, the cache, and so on. 

sys.dm_os_memory_clerks provides information related to memory clerk processes that manage SQL Server memory. For example, significant memory allocation in the buffer pool associated with the MEMORYCLERK_SQLQERESERVATIONS may indicate insufficient memory in the buffer pool for certain queries to execute. 

 

 

OS instability due to Lock Pages in Memory plus unlimited max server memory 

When Lock Pages in Memory is being used for a SQL Server instance, the default configuration for the max server memory sp_configure option must be changed to limit the amount of memory that the SQL Server instance can allocate for the buffer pool. If the max server memory configuration option is not set for the instance, under the default configuration the instance will allocate all of the memory available on the server for use by the SQL Server buffer pool. The problem with this scenario is that 

when the Windows OS gets into memory pressure (and it certainly will, as SQL Server commits all of the memory available on the server), it can't page out or trim the SQL 

Server working set in response to the memory pressure.  

This leaves the OS at the mercy of SQL Server to respond to the memory pressure fast enough to prevent the Windows OS from crashing. The same scenario can occur if an inappropriately high value has been set for max server memory for the instance. For this reason, it is critical that when Lock Pages in Memory is set for a SQL Server instance, the max server memory configuration option should be set low enough to ensure that the Windows OS never gets into memory pressure. 

 

Memory ballooning and memory paging. 

Memory ballooning of the guest virtual machines is a process where a specialized "balloon driver," installed as a part of the VM tools in each guest, begins acquiring 

memory from the VM. This leads to memory pressure inside the VM, which causes "low memory" resource notifications to be set by the Windows OS which, in turn, forces processes running under the guest OS to release memory, which can then be returned to the host server. 

Memory ballooning is the preferred option, but when ballooning of the guest VMs 

can't release memory fast enough to cope with the host-level memory pressure, the alternative method used by hypervisors is to actively page the guest memory allocations 

out to disk, which results in severe degradation of performance of the paged-out guests 

on the system. 

SQL Servers that are running as VMs require special considerations with regard to their memory configurations in the hypervisor, in order to minimize problems associated with ballooning and paging when the host hypervisor comes under memory pressure. 

When a SQL Server VM voluntarily reduces its memory usage as a result of ballooning, this can, in extreme cases, result in SQL Server reducing its buffer pool allocations to the point that little to no memory is being used by SQL Server for caching data. This, in turn, will result in excessive physical disk I/O operations by the instance. 

All of the hypervisors currently supported by Microsoft provide mechanisms for reserving a minimum amount of memory for the VM that is hosting SQL Server, which prevents the hypervisor from ballooning or paging the VM in the event that the host experiences memory pressure. Setting the appropriate reservation for SQL Server is a recommended best practice by the hypervisor vendors. In addition, setting the correct min server memory configuration inside of SQL Server can allow for partial ballooning to occur, while ensuring that SQL Server continues to allocate the buffer pool memory needed to meet the environmental SLAs. 

No comments:

Post a Comment