Saturday, December 7, 2024

Query Optimization

The job of the Query Optimizer is to take the query tree that was output from the algebrizer and find a “good” way to retrieve the data (results) needed. The query optimization process is based on a principle of cost, which is an abstract measure of work that is used to evaluate different query plan options. However, users should consider cost to be a dimensionless value that doesn’t have any units — its value is derived from comparisons to the cost of other plans in order to find the cheapest one. Therefore, there are no true units for cost values. 

 

The current model provided by the SQL Server team works something like this: 

➤ Is a valid plan cached? If yes, then use the cached plan. If no plan exists, then continue. 

➤ Is this a trivial plan? If yes, then use the trivial plan. If no, then continue. 

➤ Apply simplification. Simplification is a process of normalizing the query tree and applying some basic transformations to additionally “simplify” the tree. 

➤ Is the plan cheap enough? If yes, then use this. If no, then start optimization. 

➤ Start cost-based optimization. 

➤ Phase 0 — Explore basic rules, and hash and nested join options. 

➤ Does the plan have a cost of less than 0.2? If yes, then use this. If no, then continue. 

 

➤ Phase 1 — Explore more rules, and alternate join ordering. If the best (cheapest) plan costs less than 1.0, then use this plan. If not, then if MAXDOP > 0 and this is an SMP system, and the min cost > cost threshold for parallelism, then use a parallel plan. Compare the cost of the parallel plan with the best serial plan, and pass the cheaper of the two to phase 2. 

➤ Phase 2 — Explore all options, and opt for the cheapest plan after a limited number of explorations. 

 

The output of the preceding steps is an executable plan that can be placed in the cache. This plan is then scheduled for execution. You can view the inner workings of the optimization process via the DMV sys.dm_exec_query_optimizer_info. This DMV contains a set of optimization attributes, each with an occurrence and a value. 

 

Parallel Plans 

A parallel plan is any plan for which the Optimizer has chosen to split an applicable operator into multiple threads that are run in parallel. Not all operators are suitable to be used in a parallel plan. The Optimizer will only choose a parallel plan if: 

➤ the server has multiple processors, 

➤ the maximum degree of parallelism setting allows parallel plans, 

➤ the cost threshold for parallelism sql server configuration option is set to a value lower than the lowest cost estimate for the current plan. Note that the value set here is the time in seconds estimated to run the serial plan on a specific hardware configuration chosen by the Query Optimizer team. 

➤ The cost of the parallel plan is cheaper than the serial plan. 

If all these criteria are met, then the Optimizer will choose to parallelize the operation. 

No comments:

Post a Comment