Saturday, July 4, 2015

Data Compression

In SQL Server 2005 SP2, Microsoft introduced data type-level compression, which allowed data of the decimal data type to be stored in a variable length field, also referred to as vardecimal. In SQL Server 2008, the concept of compression is extended even further to all of the fixed length data types, including int, char, and float data types. Basically, you can allow SQL Server to save space by storing your data like it was a variable-sized type, yet in usage, the data will appear and behave like a fixed length type.

For example, if you stored the value of 100 in an int column, SQL Server needn’t use all 32 bits; It can simply use 8 bits (1 byte) as tinyint. Another case is when you use a char (30) column but store only two characters; 28 characters could be saved. There is an overhead of 2 bytes per variable-length column (or 4 bits if the size of the column is less than 8 bytes). Note that compression is only available in the Enterprise Edition.

This data type-level compression is referred to as row compression. Row compression is a very interesting thing for many databases that use lots of fixed length data (for example, integers, especially for surrogate keys).

SQL Server 2008 also includes an additional compression capability called page compression. With page compression (which includes row compression), the storage engine does a couple of interesting things to compress the data on a page:
·         Prefix compression: Looks for repeated values in a value (like '0000001' and compresses the prefix to something like 6-0 (six zeros)
·         Dictionary compression: For all values on the page, the storage engine looks for duplication and stores the duplicated value once and then stores pointers on the data pages where the duplicated values originally resided.

You can apply data compression to your tables and indexes with the CREATE TABLE, ALTER TABLE, CREATE INDEX, and ALTER INDEX syntaxes.

CREATE TABLE test
(
testId int,
value int
)
WITH (DATA_COMPRESSION = ROW) -- PAGE or NONE
ALTER TABLE test REBUILD WITH (DATA_COMPRESSION = PAGE) ;


Giving advice on whether to use compression is not really possible without knowing the factors that surround your actual situation. One tool you can use is the system procedure— sp_estimate_data_compression_savings—to check existing data to see just how compressed the data in the table or index would be after applying compression, but it won’t tell you how the compression will positively or negatively affect your performance. There are trade-offs to any sorts of compression. CPU utilization will certainly go up, because instead of directly using the data right from the page, the query processor will have to translate the values from the compressed format into the uncompressed format that SQL Server will use. On the other hand, if you have a lot of data that would benefit from compression, you could possibly lower your I/O enough to make doing so worth the cost. 

No comments:

Post a Comment