Monday, June 27, 2011

Data Types

                  Before knowing various data types, you must know what is actually a data type. A data type is an attribute which indicates what kind of data (characters, numbers, dates etc) will be held in a column. Following is the list of data types in Sql server.

Exact numerics

Type From To
bigint -9,223,372,036,854,775,808 9,223,372,036,854,775,807
int -2,147,483,648 2,147,483,647
smallint -32,768 32,767
tinyint 0 255
bit 0 1
decimal -10^38 +1 10^38 –1
numeric -10^38 +1 10^38 –1
money -922,337,203,685,477.5808 +922,337,203,685,477.5807
smallmoney -214,748.3648 +214,748.3647
numeric and decimal are Fixed precision and scale data types and are functionally equivalent.

Approximate numerics

Type From To
float -1.79E + 308 1.79E + 308
real -3.40E + 38 3.40E + 38

datetime and smalldatetime

Type From To
datetime (3.33 milliseconds accuracy) Jan 1, 1753 Dec 31, 9999
smalldatetime (1 minute accuracy) Jan 1, 1900 Jun 6, 2079

Character Strings

Type Description
char Fixed-length non-Unicode character data with a maximum length of 8,000 characters.
varchar Variable-length non-Unicode data with a maximum of 8,000 characters.
varchar(max) Variable-length non-Unicode data with a maximum length of 231 characters (SQL Server 2005 only).
text Variable-length non-Unicode data with a maximum length of 2,147,483,647 characters.

Unicode Character Strings

Type Description
nchar Fixed-length Unicode data with a maximum length of 4,000 characters.
nvarchar Variable-length Unicode data with a maximum length of 4,000 characters.
nvarchar(max) Variable-length Unicode data with a maximum length of 230 characters (SQL Server 2005 only).
ntext Variable-length Unicode data with a maximum length of 1,073,741,823 characters.

Binary Strings

Type Description
binary Fixed-length binary data with a maximum length of 8,000 bytes.
varbinary Variable-length binary data with a maximum length of 8,000 bytes.
varbinary(max) Variable-length binary data with a maximum length of 231 bytes (SQL Server 2005 only).
image Variable-length binary data with a maximum length of 2,147,483,647 bytes.

Other Data Types

  • sql_variant: Stores values of various SQL Server-supported data types, except text, ntext, and timestamp.
  • timestamp: Stores a database-wide unique number that gets updated every time a row gets updated.
  • uniqueidentifier: Stores a globally unique identifier (GUID).
  • xml: Stores XML data. You can store xml instances in a column or a variable (SQL Server 2005 only).
  • cursor: A reference to a cursor.
  • table: Stores a result set for later processing.

SQL (Structured Query language)

             SQL  is a specialized language for updating, deleting, and requesting information from databases. SQL is an ANSI and ISO standard, and is the standard database query language. Database products from Oracle and Microsoft SQL Server supports SQL.  Certified by ANSI and ISO, SQL has become a database query language standard.It serves both industry-level and academic needs and is used on both individual computers and corporate servers. The SQL queries are the most common and essential SQL operations. Via an SQL query, one can search the database for the information needed. SQL queries are executed with the “SELECT” statement.SELECT retrieves data from one or more tables, or expressions.

A query includes a list of columns to be included in the final result immediately following the SELECT keyword. An asterisk ("*") can also be used to specify that the query should return all columns of the queried tables. SELECT is the most complex statement in SQL, with optional keywords and clauses that include:
  • The FROM clause which indicates the table(s) from which data is to be retrieved. The FROMJOIN subclauses to specify the rules for joining tables. clause can include optional
  • The WHERE clause includes a comparison predicate, which restricts the rows returned by the query. The WHERE clause eliminates all rows from the result set for which the comparison predicate does not evaluate to True.
  • The GROUP BY clause is used to project rows having common values into a smaller set of rows. GROUP BY is often used in conjunction with SQL aggregation functions or to eliminate duplicate rows from a result set. The WHERE clause is applied before the GROUP BY clause.
  • The HAVING clause includes a predicate used to filter rows resulting from the GROUP BYGROUP BY clause, aggregation functions can be used in the HAVING clause predicate. clause. Because it acts on the results of the
  • The ORDER BY clause identifies which columns are used to sort the resulting data, and in which direction they should be sorted (options are ascending or descending). Without an ORDER BY clause, the order of rows returned by an SQL query is undefined.

What Can SQL do?

  • SQL can execute queries against a database
  • SQL can retrieve data from a database
  • SQL can insert records in a database
  • SQL can update records in a database
  • SQL can delete records from a database
  • SQL can create new databases
  • SQL can create new tables in a database
  • SQL can create stored procedures in a database
  • SQL can create views in a database
  • SQL can set permissions on tables, procedures, and views

Tuesday, June 14, 2011

What is database? (Part II)

The Relational Model          
The relational model is the most popular type of database and an extremely powerful tool, not only to store information, but to access it as well. Relational databases are organized as tables. A table can have many records and each record can have many fields. Tables are sometimes called a relation.
 For instance, a company can have a database called customer orders, within this database will be several different tables or relations all relating to customer orders. Tables can include customer information (name, address, contact, info, customer number, etc) and other tables (relations) such as orders that the customer previously bought (this can include item number, item description, payment amount, payment method, etc

Accessing Information Using a Database
While storing data is a great feature of databases, for many database users the most important feature is quick and simple retrieval of information. In a relational database, it is extremely easy to pull up information regarding an employee, but relational databases also add the power of running queries. Queries are requests to pull specific types of information and either show them in their natural state or create a report using the data. For instance, if you had a database of employees and it included tables such as salary and job description, you can easily run a query of which jobs pay over a certain amount.

Storing a Database
Databases can be very small (less than 1 MB) or extremely large and complicated (terabytes as in many government databases), however all databases are usually stored and located on hard disk or other types of storage devices and are accessed via computer. Large databases may require separate servers and locations, however many small databases can fit easily as files located on your computer's hard drive.

What is database? (Part I)

The database used to be an extremely technical term, however with the rise of computer systems and information technology throughout our culture, the database has become a household term. A database is a collection of information that is organized so that it can easily be accessed, managed, and updated. In one view, databases can be classified according to types of content: bibliographic, full-text, numeric, and images. Computer databases typically contain aggregations of data records or files, such as sales transactions, product catalogs and inventories, and customer profiles.
In order for a database to be truly functional, it must not only store large amounts of records well, but be accessed easily. In addition, new information and changes should also be fairly easy to input. In order to have a highly efficient database system, you need to incorporate a program that manages the queries and information stored on the system. This is usually referred to as DBMS or a Database Management System.  
There are several common types of databases; each type of database has its own data model (how the data is structured). They include; Flat Model, Hierarchical Model, Relational Model and Network Model. 

Thursday, June 9, 2011

Data Integrity


The ability to ensure that persisted data can be retrieved without error is central to the Information Architecture Principle. Without data integrity, a query’s answer cannot be guaranteed to be correct. As data is essentially entities and attributes, data integrity consists of entity integrity and domain integrity, which includes referential integrity and user-defined integrity. Transactional integrity, which deals with how data is written and retrieved, is defined by the ACID principles

Entity integrity involves the structure (primary key and its attributes) of the entity. If the primary key is unique and all attributes are scalar and fully dependent on the primary key, then the integrity of the entity is good. In the physical schema, the table’s primary key enforces entity integrity. Essentially, entity integrity is normalization.

In relational theory terms, a domain is a set of possible values for an attribute, such as integers, bit values, or characters. Domain integrity ensures that only valid data is permitted in the attribute. Nullability (whether a null value is valid for an attribute) is also a part of domain integrity. In the physical schema, the data type and nullability of the row enforce domain integrity.

A subset of domain integrity, referential integrity refers to the domain integrity of foreign keys. Domain integrity says that if an attribute has a value, then that value must be in the domain. In the case of the foreign key, the domain is the list of values in the related primary key. Within a physical schema, a foreign key can be enforced by declarative referential integrity (DRI) or by a custom trigger attached to the table.

Besides the relational theory integrity concerns, the user-integrity requirements must also be enforced, as follows:
. Simple business rules, such as a restriction to a domain, limit the list of valid data entries. Check constraints are commonly used to enforce these rules in the physical schema.
. Complex business rules limit the list of valid data based on some condition. For example, certain tours may require a medical waiver. Implementing these rules in the physical schema generally requires stored procedures or triggers.

Normalization


          The most well-known concept of relational database design, normalization, was developed by Dr. E. F. Codd. It is merely a tool to reduce update anomalies (unusual objects).
It is most beneficial for databases that receive first-generation data, such as a transactional database that receives order data. Therefore, databases that are not used for first-generation data (such as reporting databases, data warehouses, or reference data stores) do not necessarily benefit from normalization.
                             The most common misconception regarding normalization is, believing that more tables mean a higher form of normalization. It is an error to follow the normal forms as if they were a waterfall design process, i.e., beginning with non-normalized design and then correcting for first normal form by eliminating redundant columns, followed by second normal form, and so on.

Normalization is usually defined in terms of normal forms. Each of the normal forms describes a possible error in designing the entities and attributes, and provides a rule for correcting the error. Each progressive form of normalization removes another type of flat-file problem.

A normalized database design has the following advantages over flat-file databases:
  • Improved data integrity owing to the elimination of duplicate storage locations for the same data
  • Reduced locking contention and improved multiple-user concurrency
  • Smaller files

The first normal form means the data is in an entity format, such that the following three conditions are met:
  • Every attribute must contain one unit of data, and each unit of data must fill one attribute. Designs that embed multiple pieces of information within an attribute violate the first normal form.
  • Each attribute must have a unique name and a unique purpose. An entity should have no repeating attributes. If the attributes repeat, or the entity is very wide, the object is too broadly designed. A design that repeats attributes, such as an order entity that includes item1, item2, and item3 attributes to hold multiple line items, violates the first normal form.
  • If the entity design requires or permits duplicate tuples, that design violates the first normal form.

The second normal form ensures that each attribute is in fact an attribute of the entity. It’s an issue of dependency. Every attribute must require its primary key, or it doesn’t belong in the database. If the entity’s primary key is a single value, this isn’t too difficult. Composite primary keys can sometimes get into trouble with the second normal form if the attributes aren’t dependent on every attribute in the primary key. If the attribute depends on one of the primary key attributes but not the other, then that is a partial dependency, which violates the second normal form.

The third normal form checks for transitive dependencies. A transitive dependency is similar to a partial dependency in that they both refer to attributes that are not fully dependent on a primary key. A dependency is transient when attribute1 is dependent on attribute2, which is dependent on the primary key. The third normal form is resolved by moving the nondependent attribute to a new entity.

Data Model


One-to-Many Relationships
          By far the most common relationship is a one-to-many relationship. Several tuples in the secondary entity relate to a single tuple in the primary entity. The relationship is between the primary entity’s primary key and the secondary entity’s foreign key.

One-to-one relationships connect two entities with primary keys at both entities. Because a primary key must be unique, each side of the relationship is restricted to one tuple. For instance, an Employee entity can store general information about the employee.

A design element that leverages the one-to-one relationship is the supertype/subtype relationship. This relationship connects a single supertype entity with multiple subtype entities.  This design is useful when some objects share a majority of attributes but differ in a few attributes, such as customers, vendors, and shippers. All three share name and address attributes, but each has specific attributes.

In a many-to-many relationship, both sides may relate to multiple tuples on the other side of the relationship. For instance, an order may have multiple items, and each item may be sold on multiple orders.

Transactional Faults


Dirty Reads
          The most egregious fault is a transaction’s work being visible to other transactions before the transaction even commits its changes. When a transaction can read another transaction’s uncommitted updates, this is called a dirty read. The problem with dirty reads is that the data being read is not yet committed, so the transaction writing the data might be rolled back.

Non-Repeatable Reads
          A non-repeatable read is similar to a dirty read, but a non-repeatable read occurs when a transaction can see the committed updates from another transaction. Reading a row inside a transaction should produce the same results every time. If reading a row twice results in different values, that’s a non-repeatable read type of transaction fault.

Phantom Rows
          The least severe transactional-integrity fault is a phantom row. Like a non-repeatable read, a phantom row occurs when updates from another transaction affect not only the result set’s data values, but cause the select to return a different set of rows. Of these transactional faults, dirty reads are the most dangerous, while non-repeatable reads are less so, and phantom rows are the least dangerous of all.

Lost Updates
          A lost update occurs when two users edit the same row, complete their edits, and save the data, and the second user’s update overwrites the first user’s update. Because lost updates occur only when two users edit the same row at the same time, the problem might not occur for months. Nonetheless, it’s a flaw in the transactional integrity of the database that needs to be prevented.

Deadlocks
          A deadlock is a special situation that occurs only when transactions with multiple tasks compete for the same data resource. For example, consider the following scenario:

Transaction one has a lock on data A and needs to lock data B to complete its Transaction, and Transaction two has a lock on data B and needs to lock data A to complete its transaction. Each transaction is stuck waiting for the other to release its lock, and neither can complete until the other does. Unless an outside force intercedes, or one of the transactions gives up and quits, this situation could persist until the end of time.