The
column you use for the clustered index will (because the key is used as the row
locator), become a part of every index for your table, so it has heavy
implications for all indexes. Because of this, for a typical OLTP system, a
very common practice is to choose a surrogate key value, often the primary key
of the table, since the surrogate can be kept very small.
Using the
surrogate key as the clustering key is great, not only because is it a small
key (most often the data type is an integer that requires only 4 bytes or
possibly less using compression) but because it’s always a unique value. A
nonunique clustering key has a 4-byte uniquifier tacked onto its value when
keys are not unique. It also helps the optimizer that an index has only unique
values, because it knows immediately that for an equality operator, either 1 or
0 values will match. Because the surrogate key is often used in joins, it’s
helpful to have smaller keys for the primary key.
The
clustered index won’t always be used for the surrogate key or even the primary key.
Other possible uses can fall under the following types:
·
Range queries: Having
all the data in order usually makes sense when there’s data that you often need
to get a range, such as from A to F.
·
Data that’s always
accessed sequentially: Obviously, if the data needs to be accessed in a given
order, having the data already sorted in that order will significantly improve
performance.
·
Queries that return
large result sets.
The
choice of how to pick the clustered index depends on a couple factors, such as
how many other indexes will be derived from this index, how big the key for the
index will be, and how often the value will change. When a clustered index
value changes, every index on the table must also be touched and changed, and
if the value can grow larger, well, then we might be talking page splits.
Another thing that is good about using the
clustered index on a monotonously increasing value is that page splits over the
entire index are greatly decreased. The table grows only on one end of the
index, and while it does need to be rebuilt occasionally using ALTER INDEX
REORGANIZE or ALTER INDEX REBUILD, you don’t end up with page splits all over
the table. By looking in the dynamic management view
sys.dm_db_index_physical_stats, you can use REBUILD on indexes with greater
than 30 percent fragmentation and use REORGANIZE otherwise.
In general the clustered indexes are good for queries that:
- Contain a large number of distinct values.
- Have a WHERE clause that returns values based on BETWEEN, > or < conditions.
- Read columns that are accessed sequentially - that is if lname, fname, minit are accessed often in queries, then you should create an index on columns in that order.
- Have a GROUP BY and / or ORDER BY clause that accesses the first few or all columns comprising the clustered index.
Point 3 might need some extra explanation. SQL Server cannot take advantage of the index (clustered or not) if the query does not specify the columns in the same order as they were submitted at index creation. This means, if you have an index on lname, fname, minit in the employee table and your query selects fname, minit, lname, the index cannot be used.
No comments:
Post a Comment