SQL Server’s query
optimizer is a cost-based optimizer; it decides on the best data access
mechanism and join strategy by identifying the selectivity, how unique the data
is, and which columns are used in filtering the data (meaning via the WHERE or JOIN
clause). Statistics exist with an index, but they also exist on columns without
an index that are used as part of a predicate. A nonclustered index is a great
way to retrieve data that is covered by the index, whereas with queries that
need columns outside the key, a clustered index can work better. With a large
result set, going to the clustered index or table directly is usually more beneficial.
Up-to-date
information on data distribution in the columns referenced as predicates helps
the optimizer determine the query strategy to use. In SQL Server, this
information is maintained in the form of statistics, which are essential for
the cost-based optimizer to create an effective query execution plan. As long
as you ensure that the default statistical settings for the database are set,
the optimizer will be able to do its best to determine effective processing
strategies dynamically. Also, as a safety measure while troubleshooting
performance, you should ensure that the automatic statistics maintenance
routine is doing its job as desired.
Statistics on an Indexed Column
The usefulness of
an index is fully dependent on the statistics of the indexed columns; without
statistics, SQL Server’s cost-based query optimizer can’t decide upon the most
effective way of using an index. To meet this requirement, SQL Server
automatically creates the statistics of an index key whenever the index is
created. It isn’t possible to turn this feature off.
As data changes,
the data-retrieval mechanism required to keep the cost of a query low may also
change. For example, if a table has only one matching row for a certain column
value, then it makes sense to retrieve the matching rows from the table by
going through the nonclustered index on the column. But if the data in the table
changes so that a large number of rows are added with the same column value,
then using the nonclustered index no longer makes sense. To be able to have SQL
Server decide this change in processing strategy as the data changes over time,
it is vital to have up-to-date statistics.
SQL Server can
keep the statistics on an index updated as the contents of the indexed column
are modified. By default, this feature is turned on and is configurable through
the Properties ➤
Options ➤
Auto Update Statistics setting of a database. Updating statistics consumes
extra CPU cycles. It’s also possible to update the statistics asynchronously.
This means when a query would normally cause statistics to be updated, instead
that query proceeds with the old statistics, and the statistics are updated
offline. This can speed up the response time of some queries, such as when the
database is large or when you have a short timeout period.
Drawbacks of Outdated Statistics
As explained in
the preceding section, the auto update statistics feature allows the optimizer
to decide on an efficient processing strategy for a query as the data changes.
If the statistics become outdated, however, then the processing strategies
decided on by the optimizer may not be applicable for the current data set and
thereby will degrade performance.
Statistics on a Non-indexed Column
Sometimes you may
have columns in join or filter criteria without any index. In addition to
statistics on indexes, SQL Server can build statistics on columns with no
indexes. The information on data distribution, or the likelihood of a
particular value occurring in a non-indexed column, can help the query
optimizer determine an optimal processing strategy. This benefits the query
optimizer even if it can’t use an index to actually locate the values. SQL
Server automatically builds statistics on non-indexed columns if it deems this
information valuable in creating a better plan, usually when the columns are
used in a predicate.
In general, you
should not disable the automatic creation of statistics on nonindexed columns.
One of the scenarios in which you may consider disabling this feature is while
executing a series of ad hoc SQL activities that you will not execute again. In
such a case, you must decide whether you want to pay the cost of automatic statistics
creation to get a better plan in this one case and affect the performance of
other SQL Server activities.
No comments:
Post a Comment