Knowing SQL Server: Transaction logs

Strategies for controlling excessive logging

If your organization has zero tolerance to any potential data loss, then you'll have no choice but to perform index rebuild operations in the FULL recovery model, and plan your log size and growth appropriately. If your Service Level Agreements (SLAs) and Operational Level Agreements (OLAs) allow some potential for data loss, then switching to BULK LOGGED recovery at the start of index rebuild can minimize the amount of space required to rebuild the index.

If your database is in FULL recovery, and is subject to frequent index reorganization operations, then you might need to consider increasing the frequency with which you take log backups, especially during the time that the indexes are being rebuilt, in order to control the size of the log. Regardless of the frequency of log backups, the log will be at least as large as the largest index being rebuilt.

Regardless of the recovery model in use, one can minimize the impact of index maintenance operations on the transaction log by reorganizing rather than rebuilding, if possible, and by only maintaining those indexes that really need it.

It's also worth noting that rebuilding small indexes is generally not worthwhile. The commonly cited threshold is around 1,000 pages. These values are based on recommendations made by Paul Randal while he managed the storage engine development team at Microsoft, and which are documented in Books Online.

Lack of log space reuse

If you suspect log growth is being caused by the log space not being reused, your first job is to find out what's preventing reuse. Start by querying sys.databases, and see what the value of the column log_reuse_wait_desc is for the database mentioned in the error message. The value of the log_reuse_wait_desc column will show the current reason why log space cannot be reused. It is possible more than one thing is preventing log reuse. The sys.databases view will only show one of the reasons. It is therefore possible to resolve one problem, query sys.databases again and see a different log_reuse_wait reason.

FULL recovery model without log backups

If the value returned for log_reuse_wait_desc, from the previous sys.databases

query, is Log Backup, then you are suffering from one of the most common causes of a full or large transaction log, namely operating a database in the FULL recovery model

(or less common, but still possible, the BULK_LOGGED recovery model), without taking

transaction log backups. It varies, depending on the edition of SQL Server that is installed, but the model database is configured in FULL recovery mode at installation, for many editions. Since the model database is a template database that is used to create new databases in SQL Server, this configuration is inherited from model, by the new database.

Using the FULL recovery model is a recommended practice for most production database environments, since it allows for point-in-time recovery of the database, minimizing data loss in the event of a disaster. However, a common mistake is then to adopt a backup strategy consisting entirely of full (and possibly differential) database backups without taking frequent transaction log backups. There are two big problems with this strategy:

Taking full database backups only protects the contents of the data file, not the log file. The only way to properly protect the data that has changed since the last full or differential backup, which will be required for point-in-time restores, is to perform a log backup.

2. Full database backups do not truncate the transaction log. Only a log backup will

cause the log file to be truncated. Without the latter, space in the log file will never be marked for reuse, and the log file will constantly grow in size.

Active transactions

If the value returned for log_reuse_wait_desc is ACTIVE_TRANSACTION, then you are suffering from the second most common cause of a full or large transaction log in SQL Server: long-running or uncommitted transactions. Long-running transactions in a database delay truncation of the VLFs that contain the log records generated after the start of the transaction, including the log records generated by changes to data in the database by other sessions, even when those changes have been committed.

Additionally, the amount of space required by a long-running transaction will be increased by space reservations for "compensation log records," which are the log records that would be generated if the transaction were rolled back in the system. This reservation is required to ensure that the transaction can be reverted successfully without running out of log space during the rollback.

Another common cause of the Active Transaction value for log_reuse_wait_desc is the presence of "orphaned" explicit transactions that somehow never got committed. Applications that allow for user input inside a transaction are especially prone to this kind of problem.

Long-running transactions

One of the most common operations that results in a long-running transaction, which also generates large numbers of log records in a database is archiving or purging of data from a database. Depending on the number of rows that exist in the date range to be deleted, this can easily become a long-running transaction that will cause transaction log growth issues, even when the database is using the SIMPLE recovery model. The problem can be exacerbated by the presence of cascading FOREIGN KEY constraints or auditing triggers. If the table from which data is being deleted is referenced by other tables, using FOREIGN KEY constraints that are designed to CASCADE ON DELETE, then details of the rows that are deleted through the cascading constraint will also be logged. If the table has a DELETE trigger on it, for auditing data changes, the operations being performed during the triggers execution will also be logged.

Replication

During transactional replication, it is the job of the log reader agent to read the transaction log, looking for log records that are associated with changes that need to be replicated to subscribers (i.e. are "pending replication"). Once the changes are replicated, it marks the log entry as "replicated." Slow or delayed log reader activity can lead to records being left as "pending replication" for long periods, during which time they will remain part of the active log, and so the parent VLF cannot be truncated.

A similar problem exists for log records required by the Change Data Capture (CDC) feature. In either case, the log_reuse_wait_desc column of sys.databases will show REPLICATION as the root cause of the problem.

Remember that simply switching to the SIMPLE recovery model, in the hope of truncating the log, will not work since replication and CDC are both supported using SIMPLE recovery, and the log records will continue to be required until the log reader

SQL Agent process harvests them.

ACTIVE_BACKUP_OR_RESTORE

When the log_reuse_wait_desc column shows ACTIVE_BACKUP_OR_RESTORE as the current wait description, a long-running full or differential backup of the database

is the most likely cause of the log reuse problems. During a full or differential backup of the database, the backup process delays log truncation so that the active portion of the transaction log can be included as a part of the full backup. This allows changes made to database pages during the backup operation to be undone when the backup is restored WITH RECOVERY, to bring the database to a consistent state. If such waits are causing persistent problems, you'll need to investigate ways to optimize the backup process, such as by improving the performance of the backups (via backup compression) or improving the performance of the underlying disk I/O system.

Troubleshoot log space issue

If, for some reason, it is not possible to perform a log backup due to a lack of disk space, or the time it would take to perform a log backup exceeds the acceptable time to resolve the problem, then it might, depending on the disaster recovery policy for the database in question, be acceptable to force a truncation of the log by temporarily switching the database to the SIMPLE recovery model in order that inactive VLFs in the log can be truncated on CHECKPOINT. You can then switch the recovery model back to FULL and perform a new full database backup (or a differential backup, assuming a full backup was taken at some previous time) to restart the log chain for point-in-time recovery.

Knowing SQL Server

Saturday, November 23, 2024

Transaction logs - 2

No comments:

Post a Comment

Blog Archive