Tuesday, August 18, 2015

Parallelism – MAXDOP

(Source: sqlserverplanet.com)

Parallelism is a feature in SQL Server which allows expensive queries to utilize more threads in order to complete quicker. The query optimizer makes the determination of how expensive a query is based upon the Cost Threshold for Parallelism setting set at the SQL Server Instance level. This cost threshold for parallelism is an estimate (roughly defined in seconds) that the query optimizer has determined a statement will take based on an estimated (or cached) execution plan. Generally the queries that qualify for parallelism are high IO queries. In a normal process, a source is read using a single SPID (Server Process ID) and it output using the same SPID. In parallelism, multiple SPIDs are used to read a source (this is known as distributing streams), then an operation may be performed in the streams, then the streams are gathered. Below is an illustration of this:



Although parallelism may seem like a good idea, often times it is not. For OLTP systems that facilitate a lot of user requests parallelism is usually an indication of poorly written queries and/or queries that are in need of indexes. It can also cause issues on servers that have a lot of processors that also have disk IO contention. The reason for this is because parallelism will by default spawn as many SPIDs as their are processors. In other words, if you have a rogue query joining a hundred million records that does not qualify for parallelism, then by itself, only one process will be reading the hundred million records. However if it qualifies for parallelism and the server has 16 processors, then you will now have 16 threads each trying to get a piece of the hundred million records powered by their own processor. I’ve seen this bring many servers to their knees.
One indication that parallelism is occurring on your system is when you run sp_who2 and you see the same SPID listed more than once. This is indicative of a large query being broken into multiple streams only to join back into a single result later. Some streams may finish their operations prior to other streams completions. When this happens it results in a wait type of CX_Packet.  When this wait type is common, then you are waiting for parallel streams to finish.
As a summary, you do not want parallelism to be enabled unless you are sure that:
  1. You have more than enough resources
  2. You have a well optimized SQL Server
  3. Your server is mainly used for ETL purposes
 When SQL Server runs on a computer with more than one processor or CPU, it detects the best degree of parallelism, that is the number of processors employed to run a single statement, for each query that has a parallel execution plan. You can use the max degree of parallelism option to limit the number of processors to use for parallel plan execution and to prevent run-away queries from impacting SQL Server performance by using all available CPUs. The default value for MAXDOP is 0 (zero) and can be set or viewed using (sp_configure). A value of 0 means that SQL Server will use all processors if a query runs in parallel.
MAXDOP should not be 0 and should not be greater than half the number of visible schedulers.
For example if you have a quad core processor with hyper-threading enable, then you will have 4x2=8 visible schedulers (each scheduler is mapped to an individual processor). Schedulers can be seen by running this query and would be the rows that have a scheduler_id < 255: If you are unsure of the above values then a generic setting for MAXDOP should not be more than 8.  So if you run the command below and have more than 8 you should set this value to a maximum of 8.
SELECT *
FROM sys.dm_os_schedulers

WHERE scheduler_id < 255;

No comments:

Post a Comment