Thursday, June 9, 2011

Data Integrity


The ability to ensure that persisted data can be retrieved without error is central to the Information Architecture Principle. Without data integrity, a query’s answer cannot be guaranteed to be correct. As data is essentially entities and attributes, data integrity consists of entity integrity and domain integrity, which includes referential integrity and user-defined integrity. Transactional integrity, which deals with how data is written and retrieved, is defined by the ACID principles

Entity integrity involves the structure (primary key and its attributes) of the entity. If the primary key is unique and all attributes are scalar and fully dependent on the primary key, then the integrity of the entity is good. In the physical schema, the table’s primary key enforces entity integrity. Essentially, entity integrity is normalization.

In relational theory terms, a domain is a set of possible values for an attribute, such as integers, bit values, or characters. Domain integrity ensures that only valid data is permitted in the attribute. Nullability (whether a null value is valid for an attribute) is also a part of domain integrity. In the physical schema, the data type and nullability of the row enforce domain integrity.

A subset of domain integrity, referential integrity refers to the domain integrity of foreign keys. Domain integrity says that if an attribute has a value, then that value must be in the domain. In the case of the foreign key, the domain is the list of values in the related primary key. Within a physical schema, a foreign key can be enforced by declarative referential integrity (DRI) or by a custom trigger attached to the table.

Besides the relational theory integrity concerns, the user-integrity requirements must also be enforced, as follows:
. Simple business rules, such as a restriction to a domain, limit the list of valid data entries. Check constraints are commonly used to enforce these rules in the physical schema.
. Complex business rules limit the list of valid data based on some condition. For example, certain tours may require a medical waiver. Implementing these rules in the physical schema generally requires stored procedures or triggers.

No comments:

Post a Comment