Why
Normalize?
·
Eliminating data
that’s duplicated, increasing the chance it won’t match when you need it
·
Avoiding
unnecessary coding needed to keep duplicated data in sync
·
Keeping tables
thin, increasing the number of values that will fit on an 8K physical database
page (which will be discussed in more detail in Chapter 9) and decreasing the
number of reads that will be needed to read data from a table
·
Maximizing the
use of clustered indexes, allowing for more optimum data access and joins
·
Lowering the
number of indexes per table, because indexes are costly to maintain
First Normal Form
For an entity to be in First Normal Form, it must
have the following characteristics:
·
All attributes
must be atomic, that is, only a single value represented in a single attribute
in a single instance of an entity.
·
All instances of
an entity must contain the same number of values.
·
All instances of
an entity must be different. (Identifying primary key)
All
Attributes Must Be Atomic
The goal of this part of First Normal Form is that
each attribute should represent only one value, not multiple values. This means
there should be nothing like an array, no delimited lists, and no other types
of multivalued attributes that you could dream up represented by a single
attribute.
All Instances in an Entity Must Contain the Same Number of Values
This part of First Normal Form says that every
instance has the same number of attributes. There are two interpretations of
this:
·
Entities have a fixed number of attributes (and
tables have a fixed number of columns).
·
Entities should be designed such that every
attribute has a fixed number of values associated with it.
Programming
Anomalies Avoided by First Normal Form
·
Modifying lists in a single column – if you have a
column having multiple email ids, then updating one of them requires a complex
query
·
Modifying multipart values – if you have a phone
number column with AAA-EEE-NNNN format, updating AAA/EEE/NNNN requires a
complex query
·
Dealing with a variable number of facts in an
instance – having multiple columns of same kind such as multiple payments or
number of children
Second Normal Form
Second Normal Form deals with the relationships and functional
dependencies between non-key attributes. An entity complying with Second Normal
Form has to have the following characteristics:
·
The entity must be in First Normal Form.
·
Each attribute must be a fact describing the entire
key.
Programming
Problems Avoided by Second Normal Form
All the programming issues that arise with Second Normal Form (as well as
the upcoming Third and Boyce-Codd Normal Forms) deal with functional
dependencies that can end up corrupting data in ways that are not always easy
to notice.
Clues That an
Entity Is Not in Second Normal Form
·
Repeating key attribute name prefixes, indicating
that the values are probably describing some additional entity
·
Data in repeating groups, showing signs of
functional dependencies between attributes
·
Composite keys without a foreign key, which might be
a sign you have key values that identify multiple things in the key, rather
than a single thing
Third Normal Form
Third Normal Form differs from Second Normal Form in
that it deals with the relationship of non-key data to non-key data, rather than
to key data. The problems are similar, and many of the symptoms are the same,
but it can be harder to locate the general kinds of violations that this form tends
to deal with. The main difference is that data in one attribute, instead of
being dependent on the key, is dependent on data in another non-key attribute. An
entity that’s in Third Normal Form has the following characteristics:
·
The entity must be in Second Normal Form.
·
Non-key attributes cannot describe other non-key
attributes.
Overnormalizing
The goal of normalization is to store only one piece
of data in one place. However, you could reduce multiple values down to being
stored in a single place, when more than one value is required. Overnormalizing
is normalizing without considering all the consequences. Sometimes in the
normalization
process, we can lose important information because all the business rules
are not considered.
For example, sometimes, we need to maintain productcost column in both
product and invoice tables. Because, in few cases, that product can be sold at
lesser price than cost defined in the product table. And also, line item cost
should be in the invoice table though it depends on quantity and cost and
violates third normal form. Remember, cost, quantity, and line item cost are
non-key attributes.
No comments:
Post a Comment