Knowing SQL Server: NonClustered Index

Non-clustered index on clustered

While having to scan two indexes probably seem like more work; overall, it’s usually better than having direct pointers to the table because only minimal reorganization is required for any modification of the values in the table. When the size of the clustering key is adequately small, this method is faster overall than having pointers directly to the table. The benefit of the key structure is certainly true when we talk about modification operations.

Because the clustering key is the same regardless of physical location, only the lowest level of the clustered index need know where the physical data is. Add to this that the data is organized sequentially, and the overhead of modifying indexes is significantly lowered. Of course, this benefit is only true if the clustering key rarely, or never, changes. Therefore, the general suggestion is to make the clustering key a small nonchanging value, such as an identity.

Non-clustered index on heap

One of the worst aspects of the heap structure is that when a row must be moved to a different physical location, the pointers in the index aren’t changed. Instead, the data is moved to a different page, and on the original location of the data, a forwarding pointer is left to point to the new page where the data is now. So if the row where name ='Dog' had moved, you might end up with following situation to extend the number of steps required to pick up the data. All existing indexes that have the old pointer simply go to the old page and follow the new pointer on that page to the new location of the data.

If you are careful with your structures, data should rarely be moved around within a heap, but you have to be careful if you’re often updating data to a larger value in a variable length column that’s used as an index key, it’s possible that a row may be moved to a different page. This adds another step to finding the data, and if the data is moved to a page on a different extent, another read to the database. This forwarding pointer is immediately followed when scanning the table, causing possible horrible performance over time if it’s not managed.

Space is not reused in the heap without rebuilding the table (by selecting into another table, adding a clustered index temporarily, or in 2008, using the ALTER TABLE command with the REBUILD option).

Caution Using a GUID for a surrogate key is becoming the vogue these days, but be careful. GUIDs are 16 bytes wide, which is a fairly large amount of space, but that is really the least of the problem. They are random values, in that they generally aren’t monotonically increasing, and a new GUID could sort anywhere in a list of other GUIDs. Clustering on a random value is generally horrible for inserts, because if you don’t leave spaces on each page for new rows, you are likely to have page splitting. If you have a very active system, the constant page splitting can destroy your system, and the opposite effect, to leave lots of empty space, can be just as painful, as you make reads far less effective. The only way to make GUIDs a reasonably acceptable type is to use the NEWSEQUENTIALID () function (or one of your own) to build sequential GUIDS, but it only works with uniqueidentifier columns in a default constraint.

When Not to Use a Nonclustered Index

Nonclustered indexes are not suitable for queries that retrieve a large number of rows. Such queries are better served with a clustered index, which doesn’t require a separate bookmark lookup to retrieve a data row. Since a bookmark lookup requires additional logical reads to get to the data page besides the logical read on the nonclustered index page, the cost of a query using a nonclustered index increases significantly for a large number of rows, such as when in a loop join that requires one lookup after another. The SQL Server query optimizer takes this cost into effect and accordingly can discard the nonclustered index when retrieving a large result set.

If your requirement is to retrieve a large result set from a table, then having a nonclustered index on the filter criterion (or the join criterion) column will probably not be useful unless you use a special type of nonclustered index called a covering index.

Benefits of a Nonclustered Index over a Clustered Index

As you learned in the previous section, a nonclustered index is preferred over a clustered index in the following situations:

· When the index key size is large

· To avoid the overhead cost associated with a clustered index since rebuilding the clustered index rebuilds all the nonclustered indexes of the table

· To resolve blocking by having a database reader work on the pages of a nonclustered index, while a database writer modifies other columns (not included in the nonclustered index) in the data page; in this case, the writer working on the data page won’t block a reader that can get all the required column values from the nonclustered index without hitting the base table.

· When all the columns (from a table) referred to by a query can be safely accommodated in the nonclustered index itself

A covering index is a useful technique for reducing the number of logical reads of a query. Adding columns using the INCLUDE statement makes this functionality easier to achieve without adding to the number of columns in an index or the size of the index key since the included columns are stored only at the leaf level of the index.

The INCLUDE is best used in the following cases:

· You don’t want to increase the size of the index keys, but you still want to make the index a covering index.

· You’re indexing a data type that can’t be indexed (except text, ntext, and images).

· You’ve already exceeded the maximum number of key columns for an index (although this is a problem best avoided).

Knowing SQL Server

Saturday, July 4, 2015

NonClustered Index

No comments:

Post a Comment

Blog Archive