Knowing SQL Server: March 2014

Tuesday, March 25, 2014

Select into - restrictions

You cannot specify a table variable or table-valued parameter as the new table.
You cannot use SELECT…INTO to create a partitioned table, even when the source table is partitioned. SELECT...INTO does not use the partition scheme of the source table; instead, the new table is created in the default filegroup. To insert rows into a partitioned table, you must first create the partitioned table and then use the INSERT INTO...SELECT FROM statement.
Indexes, constraints, and triggers defined in the source table are not transferred to the new table, nor can they be specified in the SELECT...INTO statement. If these objects are required, you can create them after executing the SELECT...INTO statement.
Specifying an ORDER BY clause does not guarantee the rows are inserted in the specified order.
When a sparse column is included in the select list, the sparse column property does not transfer to the column in the new table. If this property is required in the new table, alter the column definition after executing the SELECT...INTO statement to include this property.
When a computed column is included in the select list, the corresponding column in the new table is not a computed column. The values in the new column are the values that were computed at the time SELECT...INTO was executed.

When an existing identity column is selected into a new table, the new column inherits the IDENTITY property, unless one of the following conditions is true:

The SELECT statement contains a join, GROUP BY clause, or aggregate function.
Multiple SELECT statements are joined by using UNION.
The identity column is listed more than one time in the select list.
The identity column is part of an expression.
The identity column is from a remote data source.

Logical Processing Order of the SELECT statement

FROM
ON
JOIN
WHERE
GROUP BY
WITH CUBE or WITH ROLLUP
HAVING
SELECT
DISTINCT
ORDER BY
TOP

A cursor is a memory resident set of pointers -- meaning it occupies memory from your system that may be available for other processes. Microsoft SQL Server implements cursors by creating a temporary table and populating it with the query's result set. If a cursor is not properly closed (deallocated), the resources will not be freed until the SQL session (connection) itself is closed. This wasting of resources on the server can lead to performance degradations and failures.

The types of cursors used are very important in terms of performance.

1. FORWARD_ONLY - Specifies that the cursor can only be scrolled from the first to the last row. FETCH NEXT is the only supported fetch option. If FORWARD_ONLY is specified without the STATIC, KEYSET, or DYNAMIC keywords, the cursor operates as a DYNAMIC cursor.

2. STATIC - Defines a cursor that makes a temporary copy of the data to be used by the cursor. All requests to the cursor are answered from this temporary table in tempdb; therefore, modifications made to base tables are not reflected in the data returned by fetches made to this cursor.

3. READ_ONLY - Prevents updates made through this cursor. The cursor cannot be referenced in a WHERE CURRENT OF clause in an UPDATE or DELETE statement. This option overrides the default capability of a cursor to be updated.

4. FAST_FORWARD - Specifies a FORWARD_ONLY, READ_ONLY cursor with performance optimizations enabled. FAST_FORWARD cannot be specified if SCROLL or FOR_UPDATE is also specified.

Cursors can be faster than a while loop but they do have more overhead. If your cursor will not be updating the base tables, use a FAST_FORWARD cursor for optimal performance.

Disadvantages

1. Fetching a row from the cursor may result in a network round trip each time. This uses much more network bandwidth than would ordinarily be needed for the execution of a single SQL statement like DELETE. Repeated network round trips can severely impact the speed of the operation using the cursor.

2. Also, a majority of cursor operations occur in tempdb, so a heavily used tempdb will be even more overloaded with the use of cursors.

Sunday, March 23, 2014

Collation

A collation encodes the rules governing the proper use of characters for either a language, such as Greek or Polish, or an alphabet, such as Latin1_General (the Latin alphabet used by western European languages).

Each SQL Server collation specifies three properties:

The sort order to use for Unicode data types (nchar, nvarchar, and ntext). A sort order defines the sequence in which characters are sorted, and the way characters are evaluated in comparison operations.
The sort order to use for non-Unicode character data types (char, varchar, and text).
The code page used to store non-Unicode character data.

If the users of your instance of SQL Server speak multiple languages, you should pick a collation that best supports the requirements of the various languages. For example, if the users generally speak western European languages, choose the Latin1_General collation.

Collation refers to a set of rules that determine how data is sorted and compared. Character data is sorted using rules that define the correct character sequence, with options for specifying case-sensitivity, accent marks, kana character types and character width.

Case sensitivity

If A and a, B and b, etc. are treated in the same way then it is case-insensitive. A computer treats A and a differently because it uses ASCII code to differentiate the input. The ASCII value of A is 65, while a is 97. The ASCII value of B is 66 and b is 98.

Accent sensitivity

If a and á, o and ó are treated in the same way, then it is accent-insensitive. A computer treats a and á differently because it uses ASCII code for differentiating the input. The ASCII value of a is 97 and áis 225. The ASCII value of o is 111 and ó is 243.

Kana Sensitivity

When Japanese kana characters Hiragana and Katakana are treated differently, it is called Kana sensitive.

Width sensitivity

When a single-byte character (half-width) and the same character when represented as a double-byte character (full-width) are treated differently than it is width sensitive.

You can use the COLLATE keyword in various ways and at several levels:

COLLATE on database creation You can use the COLLATE clause of the CREATE DATABASE or ALTER DATABASE statement to specify the default collation of the database. You can also specify a collation when you create a database using SQL Server Management Studio. If you do not specify a collation, the database is assigned the default collation of the instance of SQL Server.

COLLATE on table creation You can specify collations for each varchar or char column using the COLLATE clause in the CREATE TABLE or ALTER TABLE statement. You can also specify a collation when you create a table using SQL Server Management Studio. If you do not specify a collation, the column is assigned the default collation of the database.

COLLATE by casting or expression You can use the COLLATE clause to cast an expression to a certain collation. You can assign the COLLATE clause to any ORDER BY or comparison statement.

SQL Server 2005 supports more than 1,000 collation types.

Alter Column - Rules to remember

The modified column cannot be any one of the following:

· A column with a timestamp data type.

· The ROWGUIDCOL for the table.

· A computed column or used in a computed column.

· Used in a PRIMARY KEY or [FOREIGN KEY] REFERENCES constraint.

· Used in a CHECK or UNIQUE constraint. However, changing the length of a variable-length column used in a CHECK or UNIQUE constraint is allowed.

· Associated with a default definition. However, the length, precision, or scale of a column can be changed if the data type is not changed.

· The data type of a column of a partitioned table cannot be changed.

· The data type of columns included in an index cannot be changed unless the column is a varchar, nvarchar, or varbinary data type, and the new size is equal to or larger than the old size.

· A columns included in a primary key constraint, cannot be changed from NOT NULL to NULL.

Some data type changes may cause a change in the data. For example, changing an nchar or nvarchar column to char or varchar may cause the conversion of extended characters. Reducing the precision or scale of a column may cause data truncation.

The following are criteria for type_name of an altered column:

· The previous data type must be implicitly convertible to the new data type.

type_name cannot be timestamp.
ANSI_NULL defaults are always on for ALTER COLUMN; if not specified, the column is nullable.
ANSI_PADDING padding is always ON for ALTER COLUMN.
If the modified column is an identity column, new_data_type must be a data type that supports the identity property.

Saturday, March 22, 2014

Views

The SELECT clauses in a view definition cannot include the following:

· An ORDER BY clause, unless there is also a TOP clause in the select list of the SELECT statement

· The INTO keyword

· The OPTION clause

· A reference to a temporary table or a table variable.

Check option

Forces all data modification statements executed against the view to follow the criteria set within select_statement. When a row is modified through a view, the WITH CHECK OPTION makes sure the data remains visible through the view after the modification is committed. Any updates performed directly to a view's underlying tables are not verified against the view, even if CHECK OPTION is specified.

When schemabinding is specified, the base table or tables cannot be modified in a way that would affect the view definition. The view definition itself must first be modified or dropped to remove dependencies on the table that is to be modified. When you use SCHEMABINDING, the select_statement must include the two-part names (schema.object) of tables, views, or user-defined functions that are referenced. All referenced objects must be in the same database.

ENCRYPTION Encrypts the entries in sys.syscomments that contain the text of the CREATE VIEW statement. Using WITH ENCRYPTION prevents the view from being published as part of SQL Server replication.

A view can be created only in the current database. The CREATE VIEW must be the first statement in a query batch. A view can have a maximum of 1,024 columns.

When querying through a view, the Database Engine checks to make sure that all the database objects referenced anywhere in the statement exist and that they are valid in the context of the statement, and that data modification statements do not violate any data integrity rules. A check that fails returns an error message. If a view depends on a table or view that was dropped, the Database Engine produces an error message when anyone tries to use the view. If the new table or view structure changes, the view must be dropped and re-created.

Updatable Views - Conditions

You can modify the data of an underlying base table through a view, as long as the following conditions are true:

· Any modifications, including UPDATE, INSERT, and DELETE statements, must reference columns from only one base table.

· The columns being modified in the view must directly reference the underlying data in the table columns. The columns cannot be derived in any other way as through the following:

§ An aggregate function: AVG, COUNT, SUM, MIN, MAX, GROUPING, STDEV, STDEVP, VAR, and VARP.

§ A computation. The column cannot be computed from an expression that uses other columns. Columns that are formed by using the set operators UNION, UNION ALL, CROSSJOIN, EXCEPT, and INTERSECT amount to a computation and are also not updatable.

§ The columns being modified are not affected by GROUP BY, HAVING, or DISTINCT clauses.

§ TOP is not used anywhere in the select_statement of the view together with the WITH CHECK OPTION clause.

Partitioned views

Partitioned views allow the data in a large table to be split into smaller member tables. The data is partitioned between the member tables based on ranges of data values in one of the columns. The data ranges for each member table are defined in a CHECK constraint specified on the partitioning column. A view that uses UNION ALL to combine selects of all the member tables into a single result set is then defined. When SELECT statements referencing the view specify a search condition on the partition column, the query optimizer uses the CHECK constraint definitions to determine which member table contains the rows.

To perform updates on a partitioned view, the partitioning column must be a part of the primary key of the base table. If a view is not updatable, you can create an INSTEAD OF trigger on the view that allows updates. You should design error handling into the trigger to make sure that no duplicate rows are inserted.

CHECK constraints are not needed for the partitioned view to return the correct results. However, if the CHECK constraints have not been defined, the query optimizer must search all the tables instead of only those that cover the search condition on the partitioning column. Without the CHECK constraints, the view operates like any other view with UNION ALL. The query optimizer cannot make any assumptions about the values stored in different tables and it cannot skip searching the tables that participate in the view definition.

If all the member tables referenced by a partitioned view are on the same server, the view is a local partitioned view. If the member tables are on multiple servers, the view is a distributed partitioned view. Distributed partitioned views can be used to spread the database processing load of a system across a group of servers.

Partitioned views make it easier to maintain the member tables independently. For example, you can do the following at the end of a period:

The definition of the partitioned view for current results can be changed to add the newest period and drop the oldest period.
The definition of the partitioned view for past results can be changed to add the period just dropped from the current results view. The past results view can also be updated to remove and archive the oldest period it covers.

When you insert data into the partitioned views, the sp_executesql system stored procedure can be used to create INSERT statements with execution plans that have a significant chance of being reused in systems with many concurrent users.

Limitations

Tables referenced in updateable partitioned views cannot have identity column.

Tables cannot have computed columns if you want to have updateable partitioned view.

The partitioning column has to be defined using equality or inequality operators.

Monday, March 3, 2014

Sparse Column

With the introduction of the new Sparse Column feature in SQL Server 2008, it is now possible to declare a column as Sparse and any time a NULL value is entered in the column it will not use any space. One of the tricks here is to figure out when to determine when a column should be defined as Sparse or not.

So why not just declare all columns as Sparse Columns?

Sparse Columns require four extra bytes of storage for each non NULL fixed-length data type value in the table and zero bytes to store a NULL value; therefore it is very important to have the correct threshold per data type or you will end up using more space instead of gaining it. The fewer bytes a data type uses, the higher the percentage of NULL values are required to save space.

CREATE TABLE CustomerInfo

(CustomerID INT PRIMARY KEY,

Address_Line1 VARCHAR(100) NOT NULL,

Address_Line2 VARCHAR(100) NOT NULL,

Address_Line3 VARCHAR(100) NOT NULL,

Address_Line4 VARCHAR(100) NULL,

Address_Line5 VARCHAR(100) NULL,

Address_Line6 VARCHAR(100) SPARSE NULL,

Address_Line7 VARCHAR(100) SPARSE NULL,

Address_Line8 VARCHAR(100) SPARSE NULL,

)

Filtered Index

A Filtered Index, which is an optimized non-clustered index, allows us to define a filter predicate, a WHERE clause, while creating the index. The B-Tree containing rows from the filtered index will contain only those rows which satisfy the filter criteria used while creating the index. This optimized index offers several benefits over a full table non-clustered index as follows:

The filtered index contains only those rows which satisfy the defined filter criteria. As a result it reduces the storage space requirement for the index. In the example below I will explain it more.
The filtered statistics or statistics for a filtered index are more compact and accurate, because they consider only the rows in the filtered index and the smaller size of the filtered index reduces the cost/overhead of updating the statistics.
The impact of data modification is less with a filtered index as it is updated only when the data of the index is impacted or a new record is inserted matching the filter criteria.
Maintenance costs will be reduced as well since only a subset of rows will be in consideration while re-organizing or rebuilding the index.
And most important, as it is an optimized non-clustered index, the query performance will improve if only a subset of data, which is covered by the filtered index criteria, is required.

Filtered Index	Indexed Views
A Filtered Index is created on column(s) of a particular table.	Index Views can be created on column(s) from multiple base tables.
A Filtered Index cannot use complex logic in its WHERE clause, for example the LIKE clause is not allowed, only simple comparison operators are allowed.	This limitation does not apply to indexed views and you can design your criteria as complex as you want.
A Filtered Index can be rebuilt online.	Indexed views cannot be rebuilt online.
You can create your Filtered Index as a non-unique index.	Indexed views can only be created as unique index.

Usage Scenario

When you have mostly NULL values (this column can be defined as SPARSE column in SQL Server 2008 to save space required by NULL storage) and you normally pull rows with non-NULL values.
When you have several categories of data (multiple domain range) in a single column and often select rows for one or few categories. For example, let's consider the State column in the CustomerAddress table and often you query customers addresses by state. So you can create a filtered index on the State column for StateA, StateB, StateC values. If you execute a query to pull data where State = StateA, the filtered index where State = StateA will be used and so on.

WHAT YOU CAN DO IN A FILTERED INDEX…

Use equality or inequality operators, such as =, >=, <, and more in the WHERE clause.
Use IN to create an index for a range of values. (This can support a query that does an “OR” –read about “OR” and “IN” with filtered indexes here.)
Create multiple filtered indexes on one column. In my order status example, I could have an index WHERE Status = ‘Open’, and I could have another index WHERE Status = ‘Shipping’.
Create a filtered index for all NOT NULL values – or all NULL values.

WHAT YOU CAN’T DO IN A FILTERED INDEX…

Create filtered indexes in SQL Server 2005.
Use certain expressions, such as BETWEEN, NOT IN, or a CASE statement.
Use date functions such as DATEADD for a rolling date range – the value in WHERE clause must be exact.
The query optimizer won’t consider filtered indexes if you’re using local variables or parameterized SQL for the predicate that matches the filter.

CREATE NONCLUSTERED INDEX FI_Employee_DOJ

ON Employee(DOJ)

WHERE DOJ IS NOT NULL