Non-clustered index on clustered
While
having to scan two indexes probably seem like more work; overall, it’s usually
better than having direct pointers to the table because only minimal
reorganization is required for any modification of the values in the table. When
the size of the clustering key is adequately small, this method is faster
overall than having pointers directly to the table. The benefit of the key
structure is certainly true when we talk about modification operations.
Because
the clustering key is the same regardless of physical location, only the lowest
level of the clustered index need know where the physical data is. Add to this
that the data is organized sequentially, and the overhead of modifying indexes
is significantly lowered. Of course, this benefit is only true if the
clustering key rarely, or never, changes. Therefore, the general suggestion is
to make the clustering key a small nonchanging value, such as an identity.
Non-clustered index on heap
One of
the worst aspects of the heap structure is that when a row must be moved to a
different physical location, the pointers in the index aren’t changed. Instead,
the data is moved to a different page, and on the original location of the
data, a forwarding pointer is left to point to the new page where the data is
now. So if the row where name ='Dog' had moved, you might end up with following
situation to extend the number of steps required to pick up the data. All
existing indexes that have the old pointer simply go to the old page and follow
the new pointer on that page to the new location of the data.
If you
are careful with your structures, data should rarely be moved around within a
heap, but you have to be careful if you’re often updating data to a larger
value in a variable length column that’s used as an index key, it’s possible
that a row may be moved to a different page. This adds another step to finding
the data, and if the data is moved to a page on a different extent, another
read to the database. This forwarding pointer is immediately
followed when scanning the table, causing possible horrible performance over
time if it’s not managed.
Space is
not reused in the heap without rebuilding the table (by selecting into another
table, adding a clustered index temporarily, or in 2008, using the ALTER TABLE
command with the REBUILD option).
Caution Using a GUID for a surrogate key is
becoming the vogue these days, but be careful. GUIDs are 16 bytes wide, which
is a fairly large amount of space, but that is really the least of the problem.
They are random values, in that they generally aren’t monotonically increasing,
and a new GUID could sort anywhere in a list of other GUIDs. Clustering on a
random value is generally horrible for inserts, because if you don’t leave
spaces on each page for new rows, you are likely to have page splitting. If you
have a very active system, the constant page splitting can destroy your system,
and the opposite effect, to leave lots of empty space, can be just as painful,
as you make reads far less effective. The only way to make GUIDs a reasonably
acceptable type is to use the NEWSEQUENTIALID () function (or one of your own)
to build sequential GUIDS, but it only works with uniqueidentifier columns in a
default constraint.
When Not to Use a Nonclustered
Index
Nonclustered
indexes are not suitable for queries that retrieve a large number of rows. Such
queries are better served with a clustered index, which doesn’t require a
separate bookmark lookup to retrieve a data row. Since a bookmark lookup
requires additional logical reads to get to the data page besides the logical
read on the nonclustered index page, the cost of a query using a nonclustered
index increases significantly for a large number of rows, such as when in a
loop join that requires one lookup after another. The SQL Server query optimizer
takes this cost into effect and accordingly can discard the nonclustered index
when retrieving a large result set.
If your
requirement is to retrieve a large result set from a table, then having a
nonclustered index on the filter criterion (or the join criterion) column will
probably not be useful unless you use a special type of nonclustered index
called a covering index.
Benefits of a Nonclustered Index
over a Clustered Index
As you
learned in the previous section, a nonclustered index is preferred over a
clustered index in the following situations:
·
When the index key size is large
·
To avoid the overhead cost associated with a clustered index since
rebuilding the clustered index rebuilds all the nonclustered indexes of the
table
·
To resolve blocking by having a database reader work on the pages of a
nonclustered index, while a database writer modifies other columns (not
included in the nonclustered index) in the data page; in this case, the writer
working on the data page won’t block a reader that can get all the required
column values from the nonclustered index without hitting the base table.
·
When all the columns (from a table) referred to by a query can be safely
accommodated in the nonclustered index itself
A
covering index is a useful technique for reducing the number of logical reads
of a query. Adding columns using the INCLUDE statement makes this functionality
easier to achieve without adding to the number of columns in an index or the size
of the index key since the included columns are stored only at the leaf level
of the index.
The
INCLUDE is best used in the following cases:
·
You don’t want to increase the size of the index keys, but you still want
to make the index a covering index.
·
You’re indexing a data type that can’t be indexed (except text, ntext,
and images).
·
You’ve already exceeded the maximum number of key columns for an index
(although this is a problem best avoided).
No comments:
Post a Comment