Database Normalization

Database Normalization

Database normalization is the process of organizing data in a database. Proper organization helps to eliminate redundant data and inconsistent dependencies. Tables are created and relationships are established whereby the data is protected and the database is more flexible.

Redundant data can cause many problems including wasting disk space and loss of data integrity. If the same data needs to be changed in more than one place there runs the risk of the data becoming inconsistent. In addition, database normalization makes it easier for a user to find data, removing inconsistent dependencies.

The process of database normalization follows certain rules. Each rule is called a normal form. If the database follows the first rule, it is considered to be in first normal form. While there are multiple forms, only the first three are required for most applications to be normalized. An application that has followed the first three rules is considered to be in third normal form. There are of course instances when the normal forms cannot be followed. In these cases more care must be given to ensuring that redundant data and inconsistent dependencies are at a minimum.

In the following examples, the database normalization process will be established on a database that tracks specific tasks by project.

First Normal Form

The first step in the database normalization process is to eliminate repeating groups in a table.

UnnormalizedData

In the example above, each project has been assigned two tasks. This design does not allow for additional tasks to be assigned to the project without first modifying the table and program design. The database is very inflexible. The repeating group is tasks. The table needs to be modified to accommodate an infinite number of tasks as shown here.

FirstNormalForm

 

Second Normal Form

The second step in the database normalization process is to eliminate redundant data in a table. The redundant data in this table are Project ID, Project and Reference #. This table needs to be split in two and then linked to a primary key.

Projects-SecondNormalForm

Tasks-SecondNormalForm

Now the Tasks are linked to the Project by the primary key, ProjectID. The relationship between these two tables looks like this:

SecondNormalForm-Relationships

Third Normal Form

The third and usually final step in the database normalization process is to eliminate data not directly dependent upon the primary key. The unrelated data are the Owner and Owner Address. An Owner table can be created to store the Owner information.

Owners-ThirdNormalForm

In this example the owner information has been moved to a new table. In addition, the address information has been broken out in to different fields to allow for a more flexible design. The Project table has now been modified as well.

Projects-ThirdNormalForm

The Owner in the Projects2 table is linked to the primary key Owner in the Owners table.

The relationship among the three tables looks like this:

ThirdNormalForm-Relationships

 

Fourth Normal Form

The fourth step in the database normalization process is to remove any multi-valued dependencies.

 

Fifth Normal Form

The fifth step in the database normalization process is to ensure that there are no non-trivial join dependencies.