Monday, September 28, 2015

SET STATISTICS IO

(Source: microsoft.com)


Output item
Meaning
Table
Name of the table.
Scan count
Number of seeks/scans started after reaching the leaf level in any direction to retrieve all the values to construct the final dataset for the output.
  • Scan count is 0 if the index used is a unique index or clustered index on a primary key and you are seeking for only one value. For example WHERE Primary_Key_Column = <value>.
  • Scant count is 1 when you are searching for one value using a non-unique clustered index which is defined on a non-primary key column. This is done to check for duplicate values for the key value that you are searching for. For example WHERE Clustered_Index_Key_Column = <value>.
  • Scan count is N when N is the number of different seek/scan started towards the left or right side at the leaf level after locating a key value using the index key.
logical reads
Number of pages read from the data cache.
physical reads
Number of pages read from disk.
read-ahead reads
Number of pages placed into the cache for the query.
lob logical reads
Number of textntextimage, or large value type (varchar(max)nvarchar(max)varbinary(max)) pages read from the data cache.
lob physical reads
Number of textntextimage or large value type pages read from disk.
lob read-ahead reads
Number of textntextimage or large value type pages placed into the cache for the query.

Checkpoint and Lazy Writer

(Source: sqlxdetails.com)

CHECKPOINT‘s goal is to shorten database recovery time (e.g. after a server crash), therefore increasing availability. It makes data files not lagging too much behind the transaction log. Recovery of every single database starts at the last checkpoint and rolls-forward all the transaction log records from that point on. If checkpoint was long time ago, it could be that amount of log to roll forward is huge and it can take considerable time. During that time your database is unavailable, affecting availability (you have longer downtime). More frequent checkpoints means less amount of log needed to roll-forward, resulting in faster recovery. Too frequent checkpoints make buffer pool caching less efficient and can negatively affect performance. E.g. you don’t want to checkpoint every second. Frequency of automatic checkpoint is determined by recovery interval option and the log generation rate, and usually is around every 1 minute. Some TSQL commands also do a checkpoint inside, like BACKUP DATABASE command for example. Such checkpoints are called “Internal”. And we can do a manual checkpoint by invoking a CHECKPOINT tsql command.
LAZY WRITER purpose is to release the buffer pool memory (for pages cached in buffer pool) when memory pressure occurs. When more memory is needed (e.g. for bringing in new pages to the cache), lazy writer responds to a memory pressure releasing the “coldest” pages from the buffer pool, and makes more memory available for new pages to come in. And that is normal to see that lazy writer occasionally “makes a room”. But if lazy writer constantly has a lots of work to do, starting to purge pages that are not old at all (you see ‘Buffer Node:Page Life Expectancy’ perfmon counter stays below DataCacheSizeInGB/4GB *300) – you have a problem with buffer cache memory. You do not want pages flow though the buffer cache like a rapid river. You want them to stay there and be reused, read/written and rewritten again in-memory and not the disk which is slow compared to memory, as long as possible. Low page life expectancy nullifies buffer caching purpose, affecting performance.
They both write in-memory pages to the data files on the disk. But Which pages, when, and do they release memory or not – there is the difference!
CHECKPOINT writes only dirty pages pages to the disk (dirty = changed in memory since the last checkpoint, not yet written/checkpointed to disk), making them “clean”. Checkpoint does not release any memory, the pages STAY in memory, they are not removed from the buffer pool!
LAZY WRITER looks for least recently used (“cold” = least recently read or written, not accessed in recent time) pages in the buffer pool, and releases the memory taken by them. Lazy writer releases both dirty and clean pages. Clean pages can be released without writing to disk, but dirty pages must first be written to the disk (“flushed” to the disk and become “clean”) and then buffer pool memory can be released. So, total number of pages that lazy writer releases can be higher than the number of pages lazy writer writes to the data files, because “clean” pages can be released without writing them to disk. The final result of the lazy writer is less buffer pool memory used, therefore more memory available for the fresh pages in the buffer pool.

There is another difference: checkpoint process is more efficient in writing to the disk because it can group subsequent pages into larger disk IOs, e.g. 128KB IO size. It internally usesWriteFileGather Windows API function. For details, see here and here. Lazy writer can only write 8K pages. Therefore checkpoint disk throughput is much better than lazy writer’s.

Deadlocks

(source: simple-talk.com)
            A deadlock occurs when two or more sessions are waiting for each other, in such a way that none can complete. SQL Server's lock monitor has a deadlock detector that periodically checks the locks to see if there are any circular locking chains. If it finds any, it selects one of the sessions associated with a suspended thread, kills it, rolls back its transaction and releases its locks. The killed session, known as the deadlock victim, receives error 1205.
The lock monitor picks the deadlock victim based, firstly, on the setting of DEADLOCK_PRIORITY for each session and, secondly (in the event of a tie) on the amount of work that it will take to roll back each of the open transactions. The DEADLOCK_PRIORITY is a session-scoped setting that establishes the relative importance that the session completes its work should it become embroiled in a deadlock. It can be set to HIGH, NORMAL or LOW, with NORMAL being the default. 

Bookmark lookup deadlock Bookmark lookup deadlocks are one of the most common deadlocks in SQL Server. Bookmark lookup deadlocks generally have a SELECT statement as the victim, and an INSERT, UPDATE, or DELETE statement as the other contributing process to the deadlock. They occur partly as a general consequence of SQL Server's pessimistic locking mechanisms for concurrency, but mainly due to the lack of an appropriate covering index for the SELECT operation.
            When a lookup operation occurs, the database engine takes additional shared locks on the rows or pages needed from the table. These locks are held for the duration of the SELECT operation, or until lock escalation is triggered to increase the lock granularity from row or page to table. When the data-changing session executes, it acquires an exclusive lock on the row or page of the clustered index or table. At the same time the SELECT operation acquires a shared lock on the non-clustered index. The data-changing operation requires an exclusive lock on the non-clustered index to complete the modification, and the SELECT operation requires a shared lock on the clustered index, or table, to perform the bookmark lookup. Shared locks and exclusive locks are incompatible, so if the data-changing operation and the SELECT operation affect the same rows, resulting in a deadlock.

Handling Deadlocks to Prevent Errors In most cases, the same issues that cause severe blocking in the database, such as poor database design, lack of indexing, poorly designed queries, inappropriate isolation level and so on, are also the common causes of deadlocking. An important part of application and database design is defensive programming; a technique that anticipates and handles exceptions as a part of the general code base for an application or database. Defensive programming to handle deadlock exceptions can be implemented in two different ways:
  • database-side, through the use of T-SQL TRY…CATCH blocks
  • application-side, through the use of application TRY…CATCH blocks.

In either case, proper handling of the 1205 exception raised by SQL Server for the deadlock victim can help avoid UnhandledException errors in the application and the ensuing end-user phone calls to Help Desk or Support.

OPENROWSET

Includes all connection information that is required to access remote data from an OLE DB data source. This method is an alternative to accessing tables in a linked server and is a one-time, ad hoc method of connecting and accessing remote data by using OLE DB. The OPENROWSET function can be referenced in the FROM clause of a query as if it were a table name. The OPENROWSET function can also be referenced as the target table of an INSERT, UPDATE, or DELETE statement, subject to the capabilities of the OLE DB provider. Although the query might return multiple result sets, OPENROWSET returns only the first one.
OPENROWSET also supports bulk operations through a built-in BULK provider that enables data from a file to be read and returned as a rowset.

Arguments
'provider_name' Is a character string that represents the friendly name (or PROGID) of the OLE DB provider as specified in the registry. provider_name has no default value.
'datasource' Is a string constant that corresponds to a particular OLE DB data source. datasource is the DBPROP_INIT_DATASOURCE property to be passed to the IDBProperties interface of the provider to initialize the provider. Typically, this string includes the name of the database file, the name of a database server, or a name that the provider understands to locate the database or databases.
'user_id' Is a string constant that is the user name passed to the specified OLE DB provider. user_id specifies the security context for the connection and is passed in as the DBPROP_AUTH_USERID property to initialize the provider.user_id cannot be a Microsoft Windows login name.
'password' Is a string constant that is the user password to be passed to the OLE DB provider. password is passed in as the DBPROP_AUTH_PASSWORD property when initializing the provider. password cannot be a Microsoft Windows password.
'provider_string' Is a provider-specific connection string that is passed in as the DBPROP_INIT_PROVIDERSTRING property to initialize the OLE DB provider. provider_string typically encapsulates all the connection information required to initialize the provider.
BULK Uses the BULK rowset provider for OPENROWSET to read data from a file. In SQL Server, OPENROWSET can read from a data file without loading the data into a target table. This lets you use OPENROWSET with a simple SELECT statement.
SELECT a.*
FROM OPENROWSET('SQLNCLI', 'Server=Seattle1;Trusted_Connection=yes;',
     'SELECT GroupName, Name, DepartmentID
      FROM AdventureWorks2012.HumanResources.Department

      ORDER BY GroupName, Name') AS a;

Bulk data Import and Export

SQL Server supports exporting data in bulk (bulk data) from a SQL Server table and importing bulk data into a SQL Server table or non-partitioned view. Bulk importing and bulk exporting are essential to efficient transfer data between SQL Server and heterogeneous data sources. Bulk exporting refers to copying data from a SQL Server table to a data file. Bulk importing refers to loading data from a data file into a SQL Server table. For example, you can export data from a Microsoft Excel application to a data file and then bulk import that data into a SQL Server table.


Method
Description
Imports data
Exports data
A command-line utility (Bcp.exe) that bulk exports and bulk imports data and generates format files.
Yes
Yes
A Transact-SQL statement that imports data directly from a data file into a database table or nonpartitioned view.
Yes
No
A Transact-SQL statement that uses the OPENROWSET bulk rowset provider to bulk import data into a SQL Server table by specifying the OPENROWSET(BULK…) function to select data in an INSERT statement. 
Yes
No

BULK INSERT

Imports a data file into a database table or view in a user-specified format in SQL Server.

Arguments
'data_file'
Is the full path of the data file that contains data to import into the specified table or view. BULK INSERT can import data from a disk (including network, floppy disk, hard disk, and so on).
data_file must specify a valid path from the server on which SQL Server is running. If data_file is a remote file, specify the Universal Naming Convention (UNC) name. A UNC name has the form \\Systemname\ShareName\Path\FileName. For example, \\SystemX\DiskZ\Sales\update.txt
BATCHSIZE = batch_size
Specifies the number of rows in a batch. Each batch is copied to the server as one transaction. If this fails, SQL Server commits or rolls back the transaction for every batch. By default, all data in the specified data file is one batch. 
CHECK_CONSTRAINTS
Specifies that all constraints on the target table or view must be checked during the bulk-import operation. Without the CHECK_CONSTRAINTS option, any CHECK and FOREIGN KEY constraints are ignored, and after the operation, the constraint on the table is marked as not-trusted. UNIQUE, and PRIMARY KEY constraints are always enforced. When importing into a character column that is defined with a NOT NULL constraint, BULK INSERT inserts a blank string when there is no value in the text file.
FIELDTERMINATOR = 'field_terminator'
Specifies the field terminator to be used for char and widechar data files. The default field terminator is \t (tab character
FIRSTROW = first_row
Specifies the number of the first row to load. The default is the first row in the specified data file. FIRSTROW is 1-based.
KEEPIDENTITY
Specifies that identity value or values in the imported data file are to be used for the identity column. If KEEPIDENTITY is not specified, the identity values for this column are verified but not imported and SQL Server automatically assigns unique values based on the seed and increment values specified during table creation. If the data file does not contain values for the identity column in the table or view, use a format file to specify that the identity column in the table or view is to be skipped when importing data; SQL Server automatically assigns unique values for the column.
MAXERRORS = max_errors
Specifies the maximum number of syntax errors allowed in the data before the bulk-import operation is canceled. Each row that cannot be imported by the bulk-import operation is ignored and counted as one error. If max_errors is not specified, the default is 10.
ROWTERMINATOR = 'row_terminator'
Specifies the row terminator to be used for char and widechar data files. The default row terminator is \r\n(newline character).
TABLOCK
Specifies that a table-level lock is acquired for the duration of the bulk-import operation. A table can be loaded concurrently by multiple clients if the table has no indexes and TABLOCK is specified. By default, locking behavior is determined by the table option table lock on bulk load. Holding a lock for the duration of the bulk-import operation reduces lock contention on the table, in some cases can significantly improve performance.

Note     When using a format file with BULK INSERT, you can specify up to 1024 fields only. This is same as the maximum number of columns allowed in a table. If you use BULK INSERT with a data file that contains more than 1024 fields, BULK INSERT generates the 4822 error. The bcp utility does not have this limitation, so for data files that contain more than 1024 fields, use the bcp command.
Requires INSERT and ADMINISTER BULK OPERATIONS permissions.

BULK INSERT AdventureWorks2012.Sales.SalesOrderDetail
   FROM 'f:\orders\lineitem.tbl'
   WITH
      (
         FIELDTERMINATOR =' |',
         ROWTERMINATOR =' |\n'

      );

Recompile

When a procedure is compiled for the first time or recompiled, the procedure’s query plan is optimized for the current state of the database and its objects. If a database undergoes significant changes to its data or structure, recompiling a procedure updates and optimizes the procedure’s query plan for those changes. Automatic recompiling occurs whenever SQL Server is restarted. It also occurs if an underlying table referenced by the procedure has undergone physical design changes.
Another reason to force a procedure to recompile is to counteract the "parameter sniffing" behavior of procedure compilation. When SQL Server executes procedures, any parameter values that are used by the procedure when it compiles are included as part of generating the query plan. If these values represent the typical ones with which the procedure is subsequently called, then the procedure benefits from the query plan every time that it compiles and executes. If parameter values on the procedure are frequently atypical, forcing a recompile of the procedure and a new plan based on different parameter values can improve performance.
SQL Server features statement-level recompilation of procedures. When SQL Server recompiles stored procedures, only the statement that caused the recompilation is compiled, instead of the complete procedure. If certain queries in a procedure regularly use atypical or temporary values, procedure performance can be improved by using the RECOMPILE query hint inside those queries. If you have nested stored procedures, the recompile hint only applies to code in the outermost procedure. You can still get parameter sniffing on any calls to sub-procedures within the stored procedure itself.

Using RECOMPILE in the stored procedure header is pretty drastic — the procedure won’t cache an execution plan when it runs. This means:
  • No execution plans in cache to review
  • No execution stats recorded in sys.dm_exec_query_stats
  • No execution stats recorded in sys.dm_exec_procedure_stats

There are a few ways to ensure recompilation of a stored procedure:
  • using WITH RECOMPILE,
  • making the stored procedure dynamic (think exec())
  • Marking the proc for recompile with sp_recompile.
  • changing the schema that a cached query plan relies upon
  • calling DBCC FREEPROCCACHE
  • At the query level an individual statement within a proc can be recompiled with the RECOMPILE query hint (SQL 2008).

Factors in Recompilation
Besides the hard-factors listed above, what causes stored procedure recompilation? Well, lots of things. Some of these are interwoven with the list above, but I want to re-present them b/c it might not be obvious.
  • Inserting or deleting lots of data (data density in indexes & tables often controls query plans)
  • Rebuilding indexes (a change to underlying objects)
  • Creating/dropping temp tables (again, underlying DML changes).
  • query plan ages out (think not used recently and sql want's to clean up memory use)

The SP:Recompile event identifies the reason for the recompilation with an integer code in the EventSubClass column.


EXEC sp_recompile 'test_rec';