Sunday, August 18, 2013

Surrogate Vs Natural keys



Surrogate Key:
Surrogate keys are keys that have no “business” meaning and are solely used to identify a record in the table.  Such keys are either database generated (example: Identity in SQL Server, Sequence in Oracle, Sequence/Identity in DB2 UDB etc.) or system generated values (like generated via a table in the schema).

Natural Key: 
Keys are natural if the attribute it represents is used for identification independently of the database schema.  What this basically means is that the keys are natural if people use them example: Invoice-Numbers, Tax-Ids, SSN etc.

Design considerations for choosing the Primary Key:
  1. It should be not null, Unique and should apply to all rows.
  2. It should be minimal (i.e. less number of columns in the PK: ideally it should be 1, if using composite keys, then make sure that those are surrogates and using integer family data-types).
  3. It should be stable over a period of time (should not change i.e. update to the PK columns should not happen).
Surrogate Key
I prefer surrogate keys to be DB controlled rather than being controlled via a next-up table in the schema since that is a more scalable approach.
Pros:
  1. Business Logic is not in the keys.
  2. Small 4-byte key (the surrogate key will most likely be an integer and SQL Server for example requires only 4 bytes to store it, if a bigint, then 8 bytes).
  3. Joins are very fast.
  4. No locking contentions because of unique constraint (this refers to the waits that get developed when two sessions are trying to insert the same unique business key) as the surrogates get generated by the DB and are cached – very scalable.
Cons:
  1. An additional index is needed.  In SQL Server, the PK constraint will always creates a unique index, in Oracle, if an index already exists, PK creation will use that index for uniqueness enforcement (not a con in Oracle).
  2. Cannot be used as a search key.
  3. If it is database controlled, for products that support multiple databases, different implementations are needed, example: identity in SS2k, before triggers and sequences in Oracle, identity/sequence in DB2 UDB.
  4. Always requires a join when browsing the child table(s).
Natural Key
 Pros:
  1. No additional Index.
  2. Can be used as a search key.
 Cons:
  1. If not chosen wisely (business meaning in the key(s)), then over a period of time additions may be required to the PK and modifications to the PK can occur.
  2. If using strings, joins are a bit slower as compared to the int data-type joins, storage is more as well.  Since storage is more, less data-values get stored per index page.  Also, reading strings is a two step process in some RDBMS: one to get the actual length of the string and second to actually perform the read operation to get the value.
  3. Locking contentions can arise if using application driven generation mechanism for the key.
  4. Can’t enter a record until value is known since the value has some meaning.

No comments:

Post a Comment