Knowing SQL Server: October 2011

Monday, October 31, 2011

Backup strategies

Full Backup—Simple Mode

When you follow this strategy, the DBA configures the database recovery model to simple, and performs periodic Full Database backups. This is the simplest backup strategy, and the simplest to restore. But it is also the one that takes the longest to back up and is not feasible in most scenarios. You can use this strategy in the following cases:

• Development Databases: Databases used in developing or testing environments, if you can afford to lose daily work, otherwise consider a Full and Log Backup strategy.

• Stage Databases: Databases used in Data Warehouse to stored intermediate data extracted from source systems.

• Read Only Databases: Databases that have only read only information, for example subscriber databases that store information for reporting purposes.

• System Databases: master, msdb and distribution.

Log & Full Backup

This strategy is similar to the previous one; however, instead of changing the recovery model of the database, you back up the log just before the full backup. This approach has the advantage of capturing not only the data, but also the changes that occurred since the last backup. The benefits of a Full & Log backup strategy are:

1) If Database files and Log are physically separated and the database hard drive fails, you may back up the Log and not lose any data.

2) In case of a software failure, human error (end user or DBA) or security breach; the DBA can first backup the transaction log first and then restore the database to a point in time, reducing the amount of lost data. Leaving the full recovery model on, allows you to backup the transaction log first and restore the database and the part of the transactions up to the last known good reliable state.

3) If the database has multiple files and one fails, you can first backup the tail of the transaction log (that include the transactions that occurred since the last log backup), and using the last full backup restore only the failed file and the transactional log to restore transactional integrity.

4) The transactional log backups can be used in forensic analysis.

Full & Log Backup

In this scenario, the DBA schedules one full backup and multiple Log backups, between full backups. This plan has the advantage of reducing the amount of lost information in case of total server collapse, because you can schedule log backups more regularly than you can schedule full backups.

Consider the use of the Full & Log backup strategy in 8x5 databases, and non-volatile 24x7 databases. For example, you may schedule nightly Full Database backups and Log backups every four hours.

Full – Differential – Log

Sometimes the information stored in a database is so valuable that you want to back up the information very often. However, relying on backup logs is not feasible because, in case of failure, you will have to restore every log backup since the last full backup. In this case, the Full – Differential and Log strategy may help.

When using Full, Differential, and Log Backups the DBA schedules a Full Backup occasionally, differential backups often, and Logs very often. In this case, occasionally means as frequently as needed, to minimize the impact of a full backup, but enough to avoid differential backups to grow too much; often means as frequently as needed, to minimize the impact of restoring too many log backups; and very often means as frequently as needed to minimize data loss exposure. Some implementations of this strategy may be:

1) Full backup every month, differential backups every night, and log backups every two hours.

2) Full backup every week, differential backups every 6 hours, and logs every fifteen minutes.

Backup Strategies briefly

Combine different backup types to create a backup strategy.
Do not backup tempdb.
Unless you plan to change its contents, do not back up the model database.
Use the Full Backup simple mode strategy only in development, read only, stage or system databases.
Schedule the Log backup before the Full backup
Use the Log & Full backup strategy in 8x5 databases where you can afford to lose up to one day of work.
Consider the use of the Full & Log backup strategy in 8x5 databases, and non-volatile 24x7 databases.
Use the Full, Differential and Log strategy in volatile 24x7 databases, when the value or your data requires very frequent log backups, and in nonpartitioned Data marts.
Consider the use of a Full - File/Filegroup – Log Backup strategy in Very Large Databases (VLDB) or partitioned Data marts/Data warehouses.
Schedule Backup Operations When Database Activity Is Low.
Back up first to disk, whenever possible. Consider using a File Server to store the backups.
When using a File server to store backups, consider using a private LAN trunk to avoid general network congestion.
Do not use the same physical disk that hold the database files or Log files for backup purposes.

Monday, October 24, 2011

Index Pages

Index entries are stored as rows on index pages in a format similar to the format used for data rows on data pages. Index entries store the key values and pointers to lower levels of the index, to the data pages, or to individual data rows.

Index entries are usually much smaller than a data row in a data page, and index pages are much more densely populated than data pages. If a data row has 200 bytes (including row overhead), there are 10 rows per page. An index on a 15-byte field has about 100 rows per index page (the pointers require 4–9 bytes per row, depending on the type of index and the index level).

Indexes can have multiple levels:

• Root level

• Leaf level

• Intermediate level

The root level is the highest level of the index. There is only one root page. If an allpages-locked table is very small, so that the entire index fits on a single page, there are no intermediate or leaf levels, and the root page stores pointers to the data pages. Data-only-locked tables always have a leaf level between the root page and the data pages. For larger tables, the root page stores pointers to the intermediate level index pages or to leaf-level pages.

The lowest level of the index is the leaf level. At the leaf level, the index contains a key value for each row in the table, and the rows are stored in sorted order by the index key:

• For clustered indexes on allpages-locked tables, the leaf level is the data. No other level of the index contains one index row for each data row.

• For nonclustered indexes and clustered indexes on data-only-locked tables, the leaf level contains the index key value for each row, a pointer to the page where the row is stored, and a pointer to the rows on the data page.

The leaf level is the level just above the data; it contains one index row for each data row. Index rows on the index page are stored in key value order.

Composite Index

When you include more than one column in an index, it’s referred to as a composite index. As the number of columns grows, or the number of bytes in the key grows, the effectiveness of the index is reduced. The problem is that the index is sorted by the first column values. So the second column in the index is more or less only useful if you need the first column as well. Even so, a composite index is often good to have when users are querying with predicates on all of the columns involved. The order of the columns in a query is important with respect to whether a composite can and will be used. There are a couple important considerations:

· Which column is most selective? If one column includes unique or mostly unique values, this is possibly a good candidate for the first column. The key is that the first column is the one by which the index is sorted. Searching on the second column only is less valuable (though queries using only the second column can scan the index leaf pages for values).

· Which column is used most often without the other columns? One composite index can be useful to several different queries, even if only the first column of the index is all that is being used in those queries.

Composite indexes have these advantages:

• A composite index provides opportunities for index covering.

• If queries provide search arguments on each of the keys, the composite index requires fewer I/Os than the same query using an index on any single attribute.

• A composite index is a good way to enforce the uniqueness of multiple attributes.

Good choices for composite indexes are:

• Lookup tables

• Columns that are frequently accessed together

• Columns used for vector aggregates

• Columns that make a frequently used subset from a table with very wide rows

The disadvantages of composite indexes are:

• Composite indexes tend to have large entries. This means fewer index entries per index page and more index pages to read.

• An update to any attribute of a composite index causes the index to be modified. The columns you choose should not be those that are updated often.

Poor choices are:

• Indexes that are nearly as wide as the table

• Composite indexes where only a minor key is used in the where clause

Tuesday, October 4, 2011

Indexing - Useful tips

Consider using a clustered index in the following situations:

The physical ordering supports the range retrievals of important queries—that is, queries that use BETWEEN and LIKE.
Few duplicate values mean that an equality test (=) returns few rows.
Many duplicate values mean that an equality test (=) returns many rows.
The clustered index key is used in the ORDER BY clause of critical queries.
The clustered index supports the GROUP BY clause of critical queries.
For a given row in the outer table of a join, there are few rows that match in the inner table. A clustered index on the join column in the inner table will be beneficial.
For a given row in the outer table of a join, there are many rows that match in the inner table. A clustered index on the join column in the inner table will be beneficial.

When to avoid using a clustered index:

On a volatile column. A volatile column is a column that is updated frequently. This would result in the data row moving around the table repeatedly.

Some general guidelines are as follows:

Do not create an index on a column that is not very selective. An example of this would be a column that contained a status flag containing two or three values. It is unlikely that such an index would be used by the query optimizer.
Be careful when creating indexes with large keys. Fewer keys can be held in an index page, resulting in many index pages and deeper indexes. Take care with a large key in a clustered index. This will be used as the pointer in all the non-clustered indexes on the table.
Regularly check the levels of internal and external page fragmentation with DBCC SHOWCONTIG. Tidy up by rebuilding indexes. Make sure that there is enough free space in the database to rebuild clustered indexes. Another approach is to use the Database Maintenance Wizard.
Consider using DBCC INDEXDEFRAG on tables where there is little opportunity for maintenance—for example, a 24 × 7 system.

Consider using a non-clustered index in the following situations:

Few duplicate values mean that an equality test (=) returns few rows.
The non-clustered index key is used in the ORDER BY clause of critical queries.
The non-clustered index supports the GROUP BY clause of critical queries.
For a given row in the outer table of a join, there are few rows that match in the inner table. A clustered index on the join column in the inner table will be beneficial.
A critical query can be efficiently covered.
Many applications will require the selection of a row by the primary key. This is a single-row selection and therefore would normally benefit from the creation of an index containing the same columns as the primary key. Since it is not common to request ranges of primary keys, a non-clustered index is probably the best option. If a primary key constraint is created, the index will be automatically created; it is recommended that this be a non-clustered index.

Avoid using a non-clustered index:

When a query returns many rows, such as a range retrieval, or when there are many duplicate values returned by an equality test. Also, if, for a given row in the outer table of a join, there are many rows that match in the inner table, a non-clustered index on the join column in the inner table will not be beneficial.
Avoid using a non-clustered index on a volatile column. The result may not be as unfavorable as using a clustered index, since the data row will not move; however, the index will still have to be maintained