Knowing SQL Server: Normalization

Why Normalize?

· Eliminating data that’s duplicated, increasing the chance it won’t match when you need it

· Avoiding unnecessary coding needed to keep duplicated data in sync

· Keeping tables thin, increasing the number of values that will fit on an 8K physical database page (which will be discussed in more detail in Chapter 9) and decreasing the number of reads that will be needed to read data from a table

· Maximizing the use of clustered indexes, allowing for more optimum data access and joins

· Lowering the number of indexes per table, because indexes are costly to maintain

First Normal Form

For an entity to be in First Normal Form, it must have the following characteristics:

· All attributes must be atomic, that is, only a single value represented in a single attribute in a single instance of an entity.

· All instances of an entity must contain the same number of values.

· All instances of an entity must be different. (Identifying primary key)

All Attributes Must Be Atomic

The goal of this part of First Normal Form is that each attribute should represent only one value, not multiple values. This means there should be nothing like an array, no delimited lists, and no other types of multivalued attributes that you could dream up represented by a single attribute.

All Instances in an Entity Must Contain the Same Number of Values

This part of First Normal Form says that every instance has the same number of attributes. There are two interpretations of this:

· Entities have a fixed number of attributes (and tables have a fixed number of columns).

· Entities should be designed such that every attribute has a fixed number of values associated with it.

Programming Anomalies Avoided by First Normal Form

· Modifying lists in a single column – if you have a column having multiple email ids, then updating one of them requires a complex query

· Modifying multipart values – if you have a phone number column with AAA-EEE-NNNN format, updating AAA/EEE/NNNN requires a complex query

· Dealing with a variable number of facts in an instance – having multiple columns of same kind such as multiple payments or number of children

Second Normal Form

Second Normal Form deals with the relationships and functional dependencies between non-key attributes. An entity complying with Second Normal Form has to have the following characteristics:

· The entity must be in First Normal Form.

· Each attribute must be a fact describing the entire key.

Programming Problems Avoided by Second Normal Form

All the programming issues that arise with Second Normal Form (as well as the upcoming Third and Boyce-Codd Normal Forms) deal with functional dependencies that can end up corrupting data in ways that are not always easy to notice.

Clues That an Entity Is Not in Second Normal Form

· Repeating key attribute name prefixes, indicating that the values are probably describing some additional entity

· Data in repeating groups, showing signs of functional dependencies between attributes

· Composite keys without a foreign key, which might be a sign you have key values that identify multiple things in the key, rather than a single thing

Third Normal Form

Third Normal Form differs from Second Normal Form in that it deals with the relationship of non-key data to non-key data, rather than to key data. The problems are similar, and many of the symptoms are the same, but it can be harder to locate the general kinds of violations that this form tends to deal with. The main difference is that data in one attribute, instead of being dependent on the key, is dependent on data in another non-key attribute. An entity that’s in Third Normal Form has the following characteristics:

· The entity must be in Second Normal Form.

· Non-key attributes cannot describe other non-key attributes.

Overnormalizing

The goal of normalization is to store only one piece of data in one place. However, you could reduce multiple values down to being stored in a single place, when more than one value is required. Overnormalizing is normalizing without considering all the consequences. Sometimes in the normalization

process, we can lose important information because all the business rules are not considered.

For example, sometimes, we need to maintain productcost column in both product and invoice tables. Because, in few cases, that product can be sold at lesser price than cost defined in the product table. And also, line item cost should be in the invoice table though it depends on quantity and cost and violates third normal form. Remember, cost, quantity, and line item cost are non-key attributes.

Knowing SQL Server

Saturday, July 4, 2015

Normalization

No comments:

Post a Comment

Blog Archive