Knowing SQL Server: Clustered Index

The column you use for the clustered index will (because the key is used as the row locator), become a part of every index for your table, so it has heavy implications for all indexes. Because of this, for a typical OLTP system, a very common practice is to choose a surrogate key value, often the primary key of the table, since the surrogate can be kept very small.

Using the surrogate key as the clustering key is great, not only because is it a small key (most often the data type is an integer that requires only 4 bytes or possibly less using compression) but because it’s always a unique value. A nonunique clustering key has a 4-byte uniquifier tacked onto its value when keys are not unique. It also helps the optimizer that an index has only unique values, because it knows immediately that for an equality operator, either 1 or 0 values will match. Because the surrogate key is often used in joins, it’s helpful to have smaller keys for the primary key.

The clustered index won’t always be used for the surrogate key or even the primary key. Other possible uses can fall under the following types:

· Range queries: Having all the data in order usually makes sense when there’s data that you often need to get a range, such as from A to F.

· Data that’s always accessed sequentially: Obviously, if the data needs to be accessed in a given order, having the data already sorted in that order will significantly improve performance.

· Queries that return large result sets.

The choice of how to pick the clustered index depends on a couple factors, such as how many other indexes will be derived from this index, how big the key for the index will be, and how often the value will change. When a clustered index value changes, every index on the table must also be touched and changed, and if the value can grow larger, well, then we might be talking page splits.

Another thing that is good about using the clustered index on a monotonously increasing value is that page splits over the entire index are greatly decreased. The table grows only on one end of the index, and while it does need to be rebuilt occasionally using ALTER INDEX REORGANIZE or ALTER INDEX REBUILD, you don’t end up with page splits all over the table. By looking in the dynamic management view sys.dm_db_index_physical_stats, you can use REBUILD on indexes with greater than 30 percent fragmentation and use REORGANIZE otherwise.

In general the clustered indexes are good for queries that:

Contain a large number of distinct values.
Have a WHERE clause that returns values based on BETWEEN, > or < conditions.
Read columns that are accessed sequentially - that is if lname, fname, minit are accessed often in queries, then you should create an index on columns in that order.
Have a GROUP BY and / or ORDER BY clause that accesses the first few or all columns comprising the clustered index.

Point 3 might need some extra explanation. SQL Server cannot take advantage of the index (clustered or not) if the query does not specify the columns in the same order as they were submitted at index creation. This means, if you have an index on lname, fname, minit in the employee table and your query selects fname, minit, lname, the index cannot be used.

Knowing SQL Server

Saturday, July 4, 2015

Clustered Index

No comments:

Post a Comment

Blog Archive