Knowing SQL Server: 2011

Saturday, November 12, 2011

Queries - Index choosen by the optimizer

Query	IndexType
select top 10000 * from SalesOrder	Clustered Index Scan
select * from SalesOrder where SalesOrderDetailID>10000 and SalesOrderDetailID<20000	clustered index seek
select * from SalesOrder where ProductID=714 and ProductID=715	constant scan
select * from SalesOrder where ProductID=714 or ProductID=715	Clustered Index Scan
select SalesOrderID,OrderQty,UnitPrice from SalesOrder where ProductID in(714,715)	Clustered Index Scan
select OrderQty,UnitPrice from SalesOrder where SalesOrderID =43659	Index Seek
select OrderQty,UnitPrice from SalesOrder where SalesOrderID >40000	Index Seek
select * from SalesOrder where SalesOrderID >40000	Clustered Index Scan
select * from SalesOrder where ProductID=714 or ProductID=715 order by SalesOrderID	Clustered Index Scan
select OrderQty,UnitPrice from SalesOrder where SalesOrderID =43659 order by SalesOrderDetailID	Index Seek
select OrderQty,UnitPrice from SalesOrder order by SalesOrderDetailID Clustered Index - SalesOrderDetailID Non Clustered - SalesOrderID,OrderQty,UnitPrice	Clustered Index Scan

Monday, October 31, 2011

Backup strategies

Full Backup—Simple Mode

When you follow this strategy, the DBA configures the database recovery model to simple, and performs periodic Full Database backups. This is the simplest backup strategy, and the simplest to restore. But it is also the one that takes the longest to back up and is not feasible in most scenarios. You can use this strategy in the following cases:

• Development Databases: Databases used in developing or testing environments, if you can afford to lose daily work, otherwise consider a Full and Log Backup strategy.

• Stage Databases: Databases used in Data Warehouse to stored intermediate data extracted from source systems.

• Read Only Databases: Databases that have only read only information, for example subscriber databases that store information for reporting purposes.

• System Databases: master, msdb and distribution.

Log & Full Backup

This strategy is similar to the previous one; however, instead of changing the recovery model of the database, you back up the log just before the full backup. This approach has the advantage of capturing not only the data, but also the changes that occurred since the last backup. The benefits of a Full & Log backup strategy are:

1) If Database files and Log are physically separated and the database hard drive fails, you may back up the Log and not lose any data.

2) In case of a software failure, human error (end user or DBA) or security breach; the DBA can first backup the transaction log first and then restore the database to a point in time, reducing the amount of lost data. Leaving the full recovery model on, allows you to backup the transaction log first and restore the database and the part of the transactions up to the last known good reliable state.

3) If the database has multiple files and one fails, you can first backup the tail of the transaction log (that include the transactions that occurred since the last log backup), and using the last full backup restore only the failed file and the transactional log to restore transactional integrity.

4) The transactional log backups can be used in forensic analysis.

Full & Log Backup

In this scenario, the DBA schedules one full backup and multiple Log backups, between full backups. This plan has the advantage of reducing the amount of lost information in case of total server collapse, because you can schedule log backups more regularly than you can schedule full backups.

Consider the use of the Full & Log backup strategy in 8x5 databases, and non-volatile 24x7 databases. For example, you may schedule nightly Full Database backups and Log backups every four hours.

Full – Differential – Log

Sometimes the information stored in a database is so valuable that you want to back up the information very often. However, relying on backup logs is not feasible because, in case of failure, you will have to restore every log backup since the last full backup. In this case, the Full – Differential and Log strategy may help.

When using Full, Differential, and Log Backups the DBA schedules a Full Backup occasionally, differential backups often, and Logs very often. In this case, occasionally means as frequently as needed, to minimize the impact of a full backup, but enough to avoid differential backups to grow too much; often means as frequently as needed, to minimize the impact of restoring too many log backups; and very often means as frequently as needed to minimize data loss exposure. Some implementations of this strategy may be:

1) Full backup every month, differential backups every night, and log backups every two hours.

2) Full backup every week, differential backups every 6 hours, and logs every fifteen minutes.

Backup Strategies briefly

Combine different backup types to create a backup strategy.
Do not backup tempdb.
Unless you plan to change its contents, do not back up the model database.
Use the Full Backup simple mode strategy only in development, read only, stage or system databases.
Schedule the Log backup before the Full backup
Use the Log & Full backup strategy in 8x5 databases where you can afford to lose up to one day of work.
Consider the use of the Full & Log backup strategy in 8x5 databases, and non-volatile 24x7 databases.
Use the Full, Differential and Log strategy in volatile 24x7 databases, when the value or your data requires very frequent log backups, and in nonpartitioned Data marts.
Consider the use of a Full - File/Filegroup – Log Backup strategy in Very Large Databases (VLDB) or partitioned Data marts/Data warehouses.
Schedule Backup Operations When Database Activity Is Low.
Back up first to disk, whenever possible. Consider using a File Server to store the backups.
When using a File server to store backups, consider using a private LAN trunk to avoid general network congestion.
Do not use the same physical disk that hold the database files or Log files for backup purposes.

Monday, October 24, 2011

Index Pages

Index entries are stored as rows on index pages in a format similar to the format used for data rows on data pages. Index entries store the key values and pointers to lower levels of the index, to the data pages, or to individual data rows.

Index entries are usually much smaller than a data row in a data page, and index pages are much more densely populated than data pages. If a data row has 200 bytes (including row overhead), there are 10 rows per page. An index on a 15-byte field has about 100 rows per index page (the pointers require 4–9 bytes per row, depending on the type of index and the index level).

Indexes can have multiple levels:

• Root level

• Leaf level

• Intermediate level

The root level is the highest level of the index. There is only one root page. If an allpages-locked table is very small, so that the entire index fits on a single page, there are no intermediate or leaf levels, and the root page stores pointers to the data pages. Data-only-locked tables always have a leaf level between the root page and the data pages. For larger tables, the root page stores pointers to the intermediate level index pages or to leaf-level pages.

The lowest level of the index is the leaf level. At the leaf level, the index contains a key value for each row in the table, and the rows are stored in sorted order by the index key:

• For clustered indexes on allpages-locked tables, the leaf level is the data. No other level of the index contains one index row for each data row.

• For nonclustered indexes and clustered indexes on data-only-locked tables, the leaf level contains the index key value for each row, a pointer to the page where the row is stored, and a pointer to the rows on the data page.

The leaf level is the level just above the data; it contains one index row for each data row. Index rows on the index page are stored in key value order.

Composite Index

When you include more than one column in an index, it’s referred to as a composite index. As the number of columns grows, or the number of bytes in the key grows, the effectiveness of the index is reduced. The problem is that the index is sorted by the first column values. So the second column in the index is more or less only useful if you need the first column as well. Even so, a composite index is often good to have when users are querying with predicates on all of the columns involved. The order of the columns in a query is important with respect to whether a composite can and will be used. There are a couple important considerations:

· Which column is most selective? If one column includes unique or mostly unique values, this is possibly a good candidate for the first column. The key is that the first column is the one by which the index is sorted. Searching on the second column only is less valuable (though queries using only the second column can scan the index leaf pages for values).

· Which column is used most often without the other columns? One composite index can be useful to several different queries, even if only the first column of the index is all that is being used in those queries.

Composite indexes have these advantages:

• A composite index provides opportunities for index covering.

• If queries provide search arguments on each of the keys, the composite index requires fewer I/Os than the same query using an index on any single attribute.

• A composite index is a good way to enforce the uniqueness of multiple attributes.

Good choices for composite indexes are:

• Lookup tables

• Columns that are frequently accessed together

• Columns used for vector aggregates

• Columns that make a frequently used subset from a table with very wide rows

The disadvantages of composite indexes are:

• Composite indexes tend to have large entries. This means fewer index entries per index page and more index pages to read.

• An update to any attribute of a composite index causes the index to be modified. The columns you choose should not be those that are updated often.

Poor choices are:

• Indexes that are nearly as wide as the table

• Composite indexes where only a minor key is used in the where clause

Tuesday, October 4, 2011

Indexing - Useful tips

Consider using a clustered index in the following situations:

The physical ordering supports the range retrievals of important queries—that is, queries that use BETWEEN and LIKE.
Few duplicate values mean that an equality test (=) returns few rows.
Many duplicate values mean that an equality test (=) returns many rows.
The clustered index key is used in the ORDER BY clause of critical queries.
The clustered index supports the GROUP BY clause of critical queries.
For a given row in the outer table of a join, there are few rows that match in the inner table. A clustered index on the join column in the inner table will be beneficial.
For a given row in the outer table of a join, there are many rows that match in the inner table. A clustered index on the join column in the inner table will be beneficial.

When to avoid using a clustered index:

On a volatile column. A volatile column is a column that is updated frequently. This would result in the data row moving around the table repeatedly.

Some general guidelines are as follows:

Do not create an index on a column that is not very selective. An example of this would be a column that contained a status flag containing two or three values. It is unlikely that such an index would be used by the query optimizer.
Be careful when creating indexes with large keys. Fewer keys can be held in an index page, resulting in many index pages and deeper indexes. Take care with a large key in a clustered index. This will be used as the pointer in all the non-clustered indexes on the table.
Regularly check the levels of internal and external page fragmentation with DBCC SHOWCONTIG. Tidy up by rebuilding indexes. Make sure that there is enough free space in the database to rebuild clustered indexes. Another approach is to use the Database Maintenance Wizard.
Consider using DBCC INDEXDEFRAG on tables where there is little opportunity for maintenance—for example, a 24 × 7 system.

Consider using a non-clustered index in the following situations:

Few duplicate values mean that an equality test (=) returns few rows.
The non-clustered index key is used in the ORDER BY clause of critical queries.
The non-clustered index supports the GROUP BY clause of critical queries.
For a given row in the outer table of a join, there are few rows that match in the inner table. A clustered index on the join column in the inner table will be beneficial.
A critical query can be efficiently covered.
Many applications will require the selection of a row by the primary key. This is a single-row selection and therefore would normally benefit from the creation of an index containing the same columns as the primary key. Since it is not common to request ranges of primary keys, a non-clustered index is probably the best option. If a primary key constraint is created, the index will be automatically created; it is recommended that this be a non-clustered index.

Avoid using a non-clustered index:

When a query returns many rows, such as a range retrieval, or when there are many duplicate values returned by an equality test. Also, if, for a given row in the outer table of a join, there are many rows that match in the inner table, a non-clustered index on the join column in the inner table will not be beneficial.
Avoid using a non-clustered index on a volatile column. The result may not be as unfavorable as using a clustered index, since the data row will not move; however, the index will still have to be maintained

Monday, September 12, 2011

Understanding deadlocks

A deadlock occurs when two or more tasks permanently block each other by each task having a lock on a resource which the other tasks are trying to lock. The following graph presents a high level view of a deadlock state where:

Task T1 has a lock on resource R1 (indicated by the arrow from R1 to T1) and has requested a lock on resource R2 (indicated by the arrow from T1 to R2).
Task T2 has a lock on resource R2 (indicated by the arrow from R2 to T2) and has requested a lock on resource R1 (indicated by the arrow from T2 to R1).
Because neither task can continue until a resource is available and neither resource can be released until a task continues, a deadlock state exists.

Diagram showing tasks in a deadlock state

The SQL Server Database Engine automatically detects deadlock cycles within SQL Server. The Database Engine chooses one of the sessions as a deadlock victim and the current transaction is terminated with an error to break the deadlock.

Best practices to reduce dead lock situations:
1. keep transaction short
2. keep transactions in a single batch
3. access the least amount of data possible in the transaction
4. where appropriate, use a lower isolation level

Saturday, September 10, 2011

Database Engine Tuning Advisor (DTA) - Basics

The Database Engine Tuning Advisor (DTA) is the greatly enhanced replacement to the Index Tuning Wizard tool that shipped with previous versions of SQL Server. DTA plays an important role in an overall performance solution, letting you leverage the query optimizer to receive recommendations on indexes, indexed views, or partitions that could improve performance.

Open new session (ctrl + N)

Then you will get a new pane with two options.

1. General 2. Tuning options

In the first tab you need to enter values for few things like session name, workload (file/table), database name, and tables. You can save this tuning log to a particular file.

In the tuning tab, you will get several options such as Limit tuning time (stop at), Physical design structures to use in database, Partitioning strategy to employ, Physical design structures to keep in database.

PDS to use in database - Indexed views, Indexes, nonclustered indexes, and indexes and indexed views. (Objects to be used in the tuning strategy)

Partitioning strategy to employ – No partitioning, full partitioning, and Aligned partitioning

PDS to keep in database – do not keep any existing PDS, Keep all existing PDS, keep align partitioning, Keep indexes only, and Keep clustered indexes only. (Objects to be kept intact)

Once you start analysis you will another 3 options in the pane.

Progress

Recommendations – estimated improvement 0%, Partition recommendations, Index recommendations (Database name, object name, Recommendation (create/alter/drop), target of Recommendation, Details, size, definition (query to make changes))

Reports - Tuning summary (Date, time, server, database(s), workload file, max tuning time, time taken, expected percentage of improvement, max space, used space, number of events tuned), Tuning reports

SQL Server profiler

Microsoft SQL Server Profiler is a graphical user interface to SQL Trace for monitoring an instance of the SQL Server Database Engine or Analysis Services. You can capture and save data about each event to a file or table to analyze later. For example, you can monitor a production environment to see which stored procedures are affecting performance by executing too slowly.

Profiler allows you to monitor and record what is happening inside the database engine. This is accomplished by performing a trace, which is a record of data that has been captured about events. Traces are stored in a table, a trace log file, or both and they can be either shared or private.

The actions you will be monitoring, called events, are anything that happens to the database engine, such as a failed login or a completed query. These events are logically grouped into event classes. The following event categories are available:

Cursors Database Errors and warning Locks Objects

Performance Scans Security Audit Server Sessions

Transactions TSql Stored Procedures OLEDB Broker

Full text Deprecation User Configurable Progress Report

Using SQL Server Profiler, you can do the following:

Create a trace that is based on a reusable template
Watch the trace results as the trace runs
Store the trace results in a table
Start, stop, pause, and modify the trace results as necessary
Replay the trace results

Use SQL Server Profiler to monitor only the events in which you are interested. If traces are becoming too large, you can filter them based on the information you want, so that only a subset of the event data is collected. Monitoring too many events adds overhead to the server and the monitoring process, and can cause the trace file or trace table to grow very large, especially when the monitoring process takes place over a long period of time.

Open new trace (ctrl + N)

Then that will ask for Server type, server name, and authentication details. After that you will get Trace properties windows having two tabs.

1. General 2. Events selection

In the general tab you need to give inputs to different options like trace name, trace provide name, trace provider type, version, Template, save to file – set max size, save to table, and enable trace stop time.

In the second tab you need to select events and columns. For tuning template, you will get only stored procedures and t-sql events with few columns like textdata, duration, SPID, DatabaseId, Databasename, object type, and login name.

Profiler traces templates – standard (default), blank, sp_counts, TSql, Tsql_duration, Tsql_grouped, Tsql_replay, Tsql_SPs, and tuning.

SQL Server provides a special type of trace called a replay trace, which enables you to capture a workload that you can then replay on a test system. Profiler contains a multithreaded subsystem to handle the replay of a trace. And there are specific requirements that need to be met for the replay to succeed. You must use SQL Server authentication only and capture specific events as defined in the Replay template. The replay cannot handle Windows authentication because it does not have the ability to impersonate a Windows user.

When Profiler replays a trace, it spawns multiple execution threads that use the same security context as the original execution. Threads are also synchronized to provide a realistic duplication of the workload. The start time, end time, and system process ID (SPID) columns enable Profiler to re-create the exact interleaving of queries that occurred on the original system.

Columns and events required for a replay trace

RPC Starting, audit logout, prepare sql, exec prepared sql, RPC output parameter, sql:batch starting, ExistingConnection, CursorOpen, audit login, cursor prepare, and cursor execute.

Event

An event is an action generated within an instance of SQL Server Database Engine. Examples of these are:

Login connections, failures, and disconnections.
Transact-SQL SELECT, INSERT, UPDATE, and DELETE statements.
Remote procedure call (RPC) batch status.
The start or end of a stored procedure.
The start or end of statements within stored procedures.
The start or end of an SQL batch.
An error written to the SQL Server error log.
A lock acquired or released on a database object.
An opened cursor.
Security permission checks.

All of the data generated by an event is displayed in the trace in a single row. This row is intersected by data columns that describe the event in detail.

Event Class

An event class is a type of event that can be traced. The event class contains all of the data that can be reported by an event. Examples of event classes are the following:

SQL:BatchCompleted
Audit Login
Audit Logout
Lock:Acquired
Lock:Released

Template

A template defines the default configuration for a trace. Specifically, it includes the event classes you want to monitor with SQL Server Profiler. For example, you can create a template that specifies the events, data columns, and filters to use. A template is not executed, but rather is saved as a file with a .tdf extension. Once saved, the template controls the trace data that is captured when a trace based on the template is launched.

Trace

A trace captures data based on selected event classes, data columns, and filters. For example, you can create a trace to monitor exception errors. To do this, you select the Exception event class and the Error, State, and Severity data columns. Data from these three columns needs to be collected in order for the trace results to provide meaningful data. You can then run a trace, configured in such a manner, and collect data on any Exception events that occur in the server. Trace data can be saved, or used immediately for analysis. Traces can be replayed at a later date, although certain events, such as Exception events, are never replayed. You can also save the trace as a template to build similar traces in the future.