Thursday, June 9, 2011

Normalization


          The most well-known concept of relational database design, normalization, was developed by Dr. E. F. Codd. It is merely a tool to reduce update anomalies (unusual objects).
It is most beneficial for databases that receive first-generation data, such as a transactional database that receives order data. Therefore, databases that are not used for first-generation data (such as reporting databases, data warehouses, or reference data stores) do not necessarily benefit from normalization.
                             The most common misconception regarding normalization is, believing that more tables mean a higher form of normalization. It is an error to follow the normal forms as if they were a waterfall design process, i.e., beginning with non-normalized design and then correcting for first normal form by eliminating redundant columns, followed by second normal form, and so on.

Normalization is usually defined in terms of normal forms. Each of the normal forms describes a possible error in designing the entities and attributes, and provides a rule for correcting the error. Each progressive form of normalization removes another type of flat-file problem.

A normalized database design has the following advantages over flat-file databases:
  • Improved data integrity owing to the elimination of duplicate storage locations for the same data
  • Reduced locking contention and improved multiple-user concurrency
  • Smaller files

The first normal form means the data is in an entity format, such that the following three conditions are met:
  • Every attribute must contain one unit of data, and each unit of data must fill one attribute. Designs that embed multiple pieces of information within an attribute violate the first normal form.
  • Each attribute must have a unique name and a unique purpose. An entity should have no repeating attributes. If the attributes repeat, or the entity is very wide, the object is too broadly designed. A design that repeats attributes, such as an order entity that includes item1, item2, and item3 attributes to hold multiple line items, violates the first normal form.
  • If the entity design requires or permits duplicate tuples, that design violates the first normal form.

The second normal form ensures that each attribute is in fact an attribute of the entity. It’s an issue of dependency. Every attribute must require its primary key, or it doesn’t belong in the database. If the entity’s primary key is a single value, this isn’t too difficult. Composite primary keys can sometimes get into trouble with the second normal form if the attributes aren’t dependent on every attribute in the primary key. If the attribute depends on one of the primary key attributes but not the other, then that is a partial dependency, which violates the second normal form.

The third normal form checks for transitive dependencies. A transitive dependency is similar to a partial dependency in that they both refer to attributes that are not fully dependent on a primary key. A dependency is transient when attribute1 is dependent on attribute2, which is dependent on the primary key. The third normal form is resolved by moving the nondependent attribute to a new entity.

No comments:

Post a Comment