Knowing SQL Server: Partitioned Tables and Indexes

SQL Server supports table and index partitioning. The data of partitioned tables and indexes is divided into units that can be spread across more than one file group in a database. The data is partitioned horizontally, so that groups of rows are mapped into individual partitions. All partitions of a single index or table must reside in the same database. The table or index is treated as a single logical entity when queries or updates are performed on the data.

Partitioning large tables or indexes can have the following manageability and performance benefits.

You can transfer or access subsets of data quickly and efficiently, while maintaining the integrity of a data collection. For example, an operation such as loading data from an OLTP to an OLAP system takes only seconds, instead of the minutes and hours the operation takes when the data is not partitioned.
You can perform maintenance operations on one or more partitions more quickly. The operations are more efficient because they target only these data subsets, instead of the whole table. For example, you can choose to compress data in one or more partitions or rebuild one or more partitions of an index.
You may improve query performance, based on the types of queries you frequently run and on your hardware configuration. For example, the query optimizer can process equi-join queries between two or more partitioned tables faster when the partitioning columns in the tables are the same, because the partitions themselves can be joined.

When SQL Server performs data sorting for I/O operations, it sorts the data first by partition. SQL Server accesses one drive at a time, and this might reduce performance. To improve data sorting performance, stripe the data files of your partitions across more than one disk by setting up a RAID. In this way, although SQL Server still sorts data by partition, it can access all the drives of each partition at the same time.

In addition, you can improve performance by enabling lock escalation at the partition level instead of a whole table. This can reduce lock contention on the table.

Partition function

A database object that defines how the rows of a table or index are mapped to a set of partitions based on the values of certain column, called a partitioning column. That is, the partition function defines the number of partitions that the table will have and how the boundaries of the partitions are defined. For example, given a table that contains sales order data, you may want to partition the table into twelve (monthly) partitions based on a datetime column such as a sales date.

Partition scheme

A database object that maps the partitions of a partition function to a set of filegroups. The primary reason for placing your partitions on separate filegroups is to make sure that you can independently perform backup operations on partitions. This is because you can perform backups on individual filegroups.

Partitioning column

The column of a table or index that a partition function uses to partition the table or index. Computed columns that participate in a partition function must be explicitly marked PERSISTED. All data types that are valid for use as index columns can be used as a partitioning column, except timestamp. The ntext, text, image, xml, varchar(max), nvarchar(max), or varbinary(max) data types cannot be specified. Also, Microsoft .NET Framework common language runtime (CLR) user-defined type and alias data type columns cannot be specified.

Switching in and switching out partitions can be very fast, but an exclusive lock— Called ‘SCH-M’, or Schema Modification lock— is required. This means you can get blocked from loading or removing data from your table. Also, all of your enabled non-clustered indexes must be “partition aligned” to switch a partition in. This means the partitioning key must be part of each of those indexes. If you need to maintain uniqueness on a set of columns that doesn’t include the partitioning key (which is often the case in OLTP environments), this can pose a problem.

Query Performance on Partitioned Tables: Partition Elimination and Beyond

SQL Server tries to identify when it can use limited parts of a partitioned table. The SQL Server query optimizer may direct a query to only a single partition, multiple partitions, or the whole table. Using fewer partitions than the entire table is called “partition elimination.”

Statistics are maintained for the entire partitioned table or index— you don’t get additional steps in your histogram for each partition. This means that the SQL Server Query optimizer may still have a very hard time knowing how much data is going to be returned by your query, and this difficulty will increase as your table grows. The result may be slow queries.

Queries will perform better when you specify the partitioning key in the criteria (aka the “where clause”). So, although partitioning is “transparent,” for existing applications, query tuning will almost always be required.

$PARTITION returns an int value between 1 and the number of partitions of the partition function. $PARTITION returns the partition number for any valid value, regardless of whether the value currently exists in a partitioned table or index that uses the partition function.

CREATE PARTITION FUNCTION RangePF ( int )

AS RANGE FOR VALUES (10, 100, 1000) ;

SELECT $PARTITION.RangePF (10) ;

Knowing SQL Server

Sunday, August 18, 2013

Partitioned Tables and Indexes

No comments:

Post a Comment

Blog Archive