Knowing SQL Server: Statistics

Statistics options

There are options that affect when and how statistics are created and updated. These options are configurable at the database level only.

AUTO_CREATE_STATISTICS option

When the automatic create statistics option, AUTO_CREATE_STATISTICS is ON, the Query Optimizer creates statistics on individual columns in the query predicate, as necessary, to improve cardinality estimates for the query plan. These single-column statistics are created on columns that don't already have a histogram in an existing statistics object. The AUTO_CREATE_STATISTICS option does not determine whether statistics get created for indexes. This option also does not generate filtered statistics. It applies strictly to single-column statistics for the full table.

When the Query Optimizer creates statistics as a result of using the AUTO_CREATE_STATISTICS option, the statistics name starts with _WA.

AUTO_UPDATE_STATISTICS option

When the automatic update statistics option, AUTO_UPDATE_STATISTICS is ON, the Query Optimizer determines when statistics might be out-of-date and then updates them when they are used by a query. This action is also known as statistics recompilation. 

The Query Optimizer determines when statistics might be out-of-date by counting the number of row modifications since the last statistics update and comparing the number of row modifications to a threshold. The threshold is based on the table cardinality, which can be defined as the number of rows in the table or indexed view. Marking statistics as out-of-date based on row modifications occurs even when the AUTO_UPDATE_STATISTICS option is OFF. When the AUTO_UPDATE STATISTICS option is OFF, statistics are not updated, even when they are marked as out-of-date. 

For example, in SQL 2014, if your table contains 20 thousand rows, then the calculation is 500 + (0.2 * 20,000) = 4,500 and the statistics will be updated every 4,500 modifications. In SQL 2016, if your table contains 2 million rows, then the calculation is the minimum of 500 + (0.20 * 2,000,000) = 400,500 and SQRT(1,000 * 2,000,000) = 44,721. This means the statistics will be updated every 44,721 modifications.

The AUTO_UPDATE_STATISTICS option applies to statistics objects created for indexes, single-columns in query predicates, and statistics created with the CREATE STATISTICS statement. This option also applies to filtered statistics. You can use the sys.dm_db_stats_properties to accurately track the number of rows changed in a table and decide if you wish to update statistics manually.

AUTO_UPDATE_STATISTICS is always OFF for memory-optimized tables.

AUTO_UPDATE_STATISTICS_ASYNC

The asynchronous statistics update option, AUTO_UPDATE_STATISTICS_ASYNC, determines whether the Query Optimizer uses synchronous or asynchronous statistics updates. By default, the asynchronous statistics update option is OFF, and the Query Optimizer updates statistics synchronously. The AUTO_UPDATE_STATISTICS_ASYNC option applies to statistics objects created for indexes, single columns in query predicates, and statistics created with the CREATE STATISTICS statement.

Statistics updates can be either synchronous (the default) or asynchronous.

With synchronous statistics updates, queries always compile and execute with up-to-date statistics. When statistics are out-of-date, the Query Optimizer waits for updated statistics before compiling and executing the query.

With asynchronous statistics updates, queries compile with existing statistics even if the existing statistics are out-of-date. The Query Optimizer could choose a suboptimal query plan if statistics are out-of-date when the query compiles. Statistics are typically updated soon thereafter. Queries that compile after the stats updates complete will benefit from using the updated statistics.

Asynchronous statistics update is performed by a background request. When the request is ready to write updated statistics to the database, it attempts to acquire a schema modification lock on the statistics metadata object. If a different session is already holding a lock on the same object, asynchronous statistics update is blocked until the schema modification lock can be acquired. Similarly, sessions that need to acquire a schema stability (Sch-S) lock on the statistics metadata object to compile a query may be blocked by the asynchronous statistics update background session, which is already holding or waiting to acquire the schema modification lock. Therefore, for workloads with very frequent query compilations and frequent statistics updates, using asynchronous statistics may increase the likelihood of concurrency issues due to lock blocking.

Starting with SQL Server 2022 (16.x), the auto drop option is enabled by default on all new and migrated databases. The AUTO_DROP property allows the creation of statistics objects in a mode such that a subsequent schema change will not be blocked by the statistic object, but instead the statistics will be dropped as necessary. In this way, manually created statistics with auto drop enabled behave like auto-created statistics.

INCREMENTAL

Applies to: SQL Server 2014 (12.x) and later.

When INCREMENTAL option of CREATE STATISTICS is ON, the statistics created are per partition statistics. When OFF, the statistics tree is dropped and SQL Server recomputes the statistics. The default is OFF. When new partitions are added to a large table, statistics should be updated to include the new partitions. However the time required to scan the entire table (FULLSCAN or SAMPLE option) might be quite long. Also, scanning the entire table isn't necessary because only the statistics on the new partitions might be needed. The incremental option creates and stores statistics on a per partition basis, and when updated, only refreshes statistics on those partitions that need new statistics.

Incremental stats are not supported for following statistics types:

Statistics created with indexes that are not partition-aligned with the base table.

Statistics created on Always On readable secondary databases.

Statistics created on read-only databases.

Statistics created on filtered indexes.

Statistics created on views.

Statistics created on internal tables.

Statistics created with spatial indexes or XML indexes.

The Query Optimizer already creates statistics in the following ways:

The Query Optimizer creates statistics for indexes on tables or views when the index is created. These statistics are created on the key columns of the index. If the index is a filtered index, the Query Optimizer creates filtered statistics on the same subset of rows specified for the filtered index.

The Query Optimizer creates statistics for single columns in query predicates when AUTO_CREATE_STATISTICS is on.

When creating statistics with the CREATE STATISTICS statement, we recommend keeping the AUTO_CREATE_STATISTICS option ON so that the Query Optimizer continues to routinely create single-column statistics for query predicate columns.

When a query predicate contains multiple columns that have cross-column relationships and dependencies, statistics on the multiple columns might improve the query plan. Statistics on multiple columns contain cross-column correlation statistics, called densities, that are not available in single-column statistics. Densities can improve cardinality estimates when query results depend on data relationships among multiple columns.

You can create filtered statistics by using the CREATE STATISTICS statement with the WHERE clause to define the filter predicate expression. When the Query Optimizer creates statistics for single columns and indexes, it creates the statistics for the values in all rows. When queries select from a subset of rows, and that subset of rows has a unique data distribution, filtered statistics can improve query plans.

Knowing SQL Server

Saturday, November 23, 2024

Statistics

No comments:

Post a Comment

Blog Archive