Saturday, July 4, 2015

Physical Database Structure

At the physical level, databases contain files, where the data is physically stored. These files are basically just typical Microsoft Windows files. These different files can be logically grouped into filegroups that control where they are stored on a disk. Each file contains a number of extents, which are a 64K allocation in a database file that’s made up of eight individual contiguous 8K pages. Finally, the page is the basic unit of data storage in SQL Server databases. Everything that’s stored in SQL Server is stored on pages of several types: data, index, and overflow.

Files and Filegroups
At the top level of a SQL Server instance we have the database. The database is comprised of one or more filegroups, which are a logical grouping of one or more files. We can place different filegroups on different disk drives (hopefully on a different disk drive controller) to distribute the I/O load evenly across the available hardware. It’s possible to have multiple files in the filegroup, in which case SQL Server allocates space across each file in the filegroup. For best performance, it’s best to have no more files in a filegroup than you have physical CPUs. There’s usually very little, if any, performance benefit to using multiple filegroups if they’re all on the same physical drive.

A filegroup contains one or more files, which are actual operating system files. Each database has at least one primary filegroup, whose files are called primary files (commonly suffixed as .mdf). Each database can possibly have other secondary filegroups containing the secondary files (commonly suffixed as .ndf), which are in any other filegroups. Files may only be a part of a single filegroup. SQL Server proportionally fills files by allocating extents in each filegroup equally. To place an object in a filegroup other than the default, you need to specify the name of the filegroup using the ON clause of the table- or index-creation statement, for example:
CREATE TABLE <tableName>
(...) ON <fileGroupName>

For example, if you wanted the file to start at 1GB and grow in chunks of 100MB up to 2GB, you could specify the following:
CREATE DATABASE demonstrateFileGrowth ON
PRIMARY ( NAME = Primary1,FILENAME = 'c:\demonstrateFileGrowth_primary.mdf',
SIZE = 1GB, FILEGROWTH=100MB, MAXSIZE=2GB)
LOG ON ( NAME = Log1,FILENAME = 'c:\demonstrateFileGrowth_log.ldf', SIZE = 10MB)

The growth settings are fine for smaller systems, but it’s usually better to make the files large enough so there’s no need for them to grow. File growth can be slow and cause ugly bottlenecks when OLTP traffic is trying to use a file that’s growing. Even with the existence of instant file allocation, it’s still better to have the space allocated beforehand, as you then have cordoned off the space ahead of time. You can query the sys.filegroups catalog view to view the files in the newly created database.
SELECT fg.name as file_group,
df.name as file_logical_name,
df.physical_name as physical_file_name
FROM sys.filegroups fg
join sys.database_files df
on fg.data_space_id = df.data_space_id

Extents and Pages
Files are further broken down into a number of extents, each consisting of eight separate 8K pages where tables, indexes, and so on are physically stored. SQL Server only allocates space in a database to extents. When files grow, you will notice that the size of files will be incremented only in 64K increments. Each extent in turn has eight pages that hold one specific type of data each:
·         Data: Table data.
·         Index: Index data.
·         Overflow data: Used when a row is greater than 8,060 bytes, or for varchar(max), varbinary(max), text, or image values.
·         Allocation map: Information about the allocation of extents.
·         Page free space: Information about what different pages are allocated for.
·         Index allocation: Information about extents used for table or index data.
·         Bulk changed map: Extents modified by a bulk INSERT operation.
·         Differential changed map: Extents that have changed since the last database backup command. This is used to support differential backups.

In larger databases, most every extent will contain just one type of page, but in smaller databases, SQL Server can place any kind of page in the same extent. When all data is of the same type, it’s known as a uniform extent. When pages are of various types, it’s referred to as a mixed extent. SQL Server places all table data in pages, with a header that contains metadata about the page (object ID of the owner, type of page, and so on), as well as the rows of data. At the end of the page are the offset values that tell the relational engine where the rows start. The header of the page contains identification values such as the page number, the object ID of the object the data is for, compression information, and so on. The data rows hold the actual data, Finally, there’s an allocation block that has the offsets/pointers to the row data.


Data on Pages
When you get down to the row level, the data is laid out with metadata, fixed length fields, and variable length fields. The metadata describes the row, gives information about the variable length fields, and so on. Generally speaking, since data is dealt with by the query processor at the page level, even if only a single page is needed, data can be accessed very rapidly no matter the exact physical representation.

The maximum amount of data that can be placed on a single page (including overhead from variable fields) is 8,060 bytes. When a data row grows larger than 8,060 bytes, the data in variable length columns can spill out onto an overflow page. A 16-byte pointer is left on the original page and points to the page where the overflow data is placed.

No comments:

Post a Comment