SQL Server
supports table and index partitioning. The data of partitioned tables and
indexes is divided into units that can be spread across more than one file
group in a database. The data is partitioned horizontally, so that groups of
rows are mapped into individual partitions. All partitions of a single index or
table must reside in the same database. The table or index is treated as a single
logical entity when queries or updates are performed on the data.
Partitioning large
tables or indexes can have the following manageability and performance
benefits.
- You
can transfer or access subsets of data quickly and efficiently, while
maintaining the integrity of a data collection. For example, an operation
such as loading data from an OLTP to an OLAP system takes only seconds,
instead of the minutes and hours the operation takes when the data is not
partitioned.
- You
can perform maintenance operations on one or more partitions more quickly.
The operations are more efficient because they target only these data
subsets, instead of the whole table. For example, you can choose to
compress data in one or more partitions or rebuild one or more partitions
of an index.
- You
may improve query performance, based on the types of queries you
frequently run and on your hardware configuration. For example, the query
optimizer can process equi-join queries between two or more partitioned
tables faster when the partitioning columns in the tables are the same,
because the partitions themselves can be joined.
When SQL Server
performs data sorting for I/O operations, it sorts the data first by partition.
SQL Server accesses one drive at a time, and this might reduce performance. To
improve data sorting performance, stripe the data files of your partitions
across more than one disk by setting up a RAID. In this way, although SQL
Server still sorts data by partition, it can access all the drives of each
partition at the same time.
In addition, you
can improve performance by enabling lock escalation at the partition level
instead of a whole table. This can reduce lock contention on the table.
Partition
function
A database object that defines how the
rows of a table or index are mapped to a set of partitions based on the values
of certain column, called a partitioning column. That is, the partition
function defines the number of partitions that the table will have and how the
boundaries of the partitions are defined. For example, given a table that
contains sales order data, you may want to partition the table into twelve
(monthly) partitions based on a datetime column such as a sales date.
Partition
scheme
A database object that maps the partitions
of a partition function to a set of filegroups. The primary reason for placing
your partitions on separate filegroups is to make sure that you can
independently perform backup operations on partitions. This is because you can
perform backups on individual filegroups.
Partitioning
column
The column of a table or index that a
partition function uses to partition the table or index. Computed columns that
participate in a partition function must be explicitly marked PERSISTED. All
data types that are valid for use as index columns can be used as a
partitioning column, except timestamp. The ntext, text, image, xml,
varchar(max), nvarchar(max), or varbinary(max) data types cannot be specified.
Also, Microsoft .NET Framework common language runtime (CLR) user-defined
type and alias data type columns cannot be specified.
Switching in and switching out partitions
can be very fast, but an exclusive lock— Called ‘SCH-M’, or Schema Modification
lock— is required. This means you can get blocked from loading or removing data
from your table. Also, all of your enabled non-clustered indexes must be
“partition aligned” to switch a partition in. This means the partitioning key
must be part of each of those indexes. If you need to maintain uniqueness on a
set of columns that doesn’t include the partitioning key (which is often the
case in OLTP environments), this can pose a problem.
Query Performance on Partitioned Tables:
Partition Elimination and Beyond
SQL Server tries to identify when it can
use limited parts of a partitioned table. The SQL Server query optimizer may
direct a query to only a single partition, multiple partitions, or the
whole table. Using fewer partitions than the entire table is called “partition
elimination.”
Statistics are maintained for the entire
partitioned table or index— you don’t get additional steps in your histogram
for each partition. This means that the SQL Server Query optimizer may still
have a very hard time knowing how much data is going to be returned by your
query, and this difficulty will increase as your table grows. The result may be
slow queries.
Queries will perform better when you
specify the partitioning key in the criteria (aka the “where clause”). So,
although partitioning is “transparent,” for existing applications, query tuning
will almost always be required.
$PARTITION returns an int value
between 1 and the number of partitions of the partition function. $PARTITION
returns the partition number for any valid value, regardless of whether the
value currently exists in a partitioned table or index that uses the partition
function.
CREATE PARTITION FUNCTION
RangePF ( int )
AS RANGE FOR VALUES (10,
100, 1000) ;
GO
SELECT $PARTITION.RangePF (10) ;
No comments:
Post a Comment