Database NORMALIZATION

Database NORMALIZATION is used to reduce data redundancy and eliminate undesirable characteristics like Database Anomalies (Update anomaly, Insertion anomaly, Deletion anomaly). This also divides larger tables into smaller tables and links them using relationships and follows simple data structure design rules to make the data easy to understand, maintain and query.

Rule of Clarity – Clarity is better than cleverness.

Rule of Simplicity – Design for simplicity; add complexity only where you must.

Rule of Transparency – Design for visibility to make inspection and debugging easier.

Rule of Robustness – Robustness is the child of transparency and simplicity.

Here is a list of Normal Forms

  • 1NF (First Normal Form)
  • 2NF (Second Normal Form)
  • 3NF (Third Normal Form)
  • BCNF (Boyce-Codd Normal Form)
  • 4NF (Fourth Normal Form)
  • 5NF (Fifth Normal Form)
  • 6NF (Sixth Normal Form)

Database Normalization Forms

The Normal Forms are designed to avoid database anomalies, and they help in following the listed rules seen before.

First Normal Form (1NF)

The database is in First normal form if

  • There are no duplicated rows in the table.
  • Entries in a column (field) are of the same kind.
  • Each cell is single-valued or atomic value (no repeating groups or arrays).

Example 01

EMP NOLAST NAME FIRST NAME DEPT NAME
1CarciaCherylMaths, Physics
2AubeNormanPhysics
3FrugmanLeonaidBotany
4SpivakIanZoology
5DohnLindaChemistry

Here Cheryl Carcia is associated with two departments (Maths & Physics) which is making DEPT NAME non atomic. To bring it to first normal form we need to break it into two tables on basis of Employee Data and Department Data.

Below two tables are holding the atomic values and duplicates removed.

Table: Employee Table

EMP NOLAST NAME FIRST NAME 
1CarciaCheryl
2AubeNorman
3FrugmanLeonaid
4SpivakIan
5DohnLinda

Table: Department table

EMP NODEPT NAME
1Maths
1Physics
2Physics
3Botany
4Zoology
5Chemistry

Second Normal Form (2NF)

A table is in 2NF 

  • If it is in 1NF
  • There should not be any partial dependency of any column on primary key. Means the table have concatenated primary key and each attribute in table depends on that concatenated primary key.
  • All Non-key attributes are fully functionally dependent on primary key.If primary is is not composite key then all non key attributes are fully functionally dependent on primary key.

Since a partial dependency occurs when a non-key attribute is dependent on only a part of the composite key, the definition of 2NF is sometimes phrased as: “A table is in 2NF if it is in 1NF and if it has no partial dependencies.”

EMP NODEPT NOLAST NAME FIRST NAME DEPT NAME
1111CarciaCherylMaths
2222AubeNormanPhysics
3111FrugmanLeonaidMaths
4222SpivakIanPhysics
5333DohnLindaChemistry

Composite key (EMP NO,DEPT NO). LAST NAME and FIRST NAME dependent on EMP NO. DEPT NAME is dependent on DEPT NO. So this table can be broken into two tables as given below

Table: Employee Table

EMP NODEPT NOLAST NAME FIRST NAME 
1111CarciaCheryl
2222AubeNorman
3111FrugmanLeonaid
4222SpivakIan
5333DohnLinda

Table: Department Table

DEPT NODEPT NAME
111Maths
222Physics
333Chemistry

Third Normal Form and BCNF (Boyce-Codd Normal Form)

A table is in 3NF if it is in 2NF and if it has no transitive dependencies.

A table is in BCNF if it is in 3NF and if every determinant is a candidate key. BCNF does not allow dependencies between attributes that belong to candidate keys. It drops the restriction of the non-key attributes from the third normal form.

EMP NORECORD NOLAST NAME FIRST NAME ATTENDANCE
1A001CarciaCheryl100
2B001AubeNorman50
3C001FrugmanLeonaid25
4D001SpivakIan15
5E001DohnLinda10

In the above table EMP NO determines the RECORD NO and RECORD NO determines LAST NAME & FIRST NAME. Hence EMP NO determines LAST NAME & FIRST NAME. RECORD NO determines the ATTENDANCE. This structure is not satisfying the Third Normal Form.

Table: Employee Table

EMP NORECORD NOLAST NAME FIRST NAME 
1A001CarciaCheryl
2B001AubeNorman
3C001FrugmanLeonaid
4D001SpivakIan
5E001DohnLinda

Table: Attendance Table

RECORD NOATTENDANCE
A001100
B00150
C00125
D00115
E00110

Fourth Normal Form (4NF)

If no database table instance contains two or more, independent and multivalued data describing the relevant entity, then it is in 4th Normal Form.

Fifth Normal Form (5NF)

A table is in 5th Normal Form only if it is in 4NF and it cannot be decomposed into any number of smaller tables without loss of data.

Sixth Normal Form (6NF)

6th Normal Form is not standardized, yet however, it is being discussed by database experts for some time. Hopefully, we would have a clear & standardized definition for 6th Normal Form in the near future

Primary Key

A primary is a single column value used to identify a database record uniquely.

It has the following attributes

  • A primary key cannot be NULL
  • A primary key value must be unique
  • The primary key values should rarely be changed
  • The primary key must be given a value when a new record is inserted.

Composite Key

A composite key is a primary key composed of multiple columns used to identify a record uniquely.

Foreign Key

Foreign Key references the primary key of another Table! It helps connect your Tables

  • A foreign key can have a different name from its primary key
  • It ensures rows in one table have corresponding rows in another
  • Unlike the Primary key, they do not have to be unique. Most often they aren’t
  • Foreign keys can be null even though primary keys can not 

Database DE-NORMALIZATION doesn’t mean not normalizing. It’s a step in the process. First we normalize, then we realize that we now have hundreds or thousands of small tables and that the performance cost of all those joins will be prohibitive, and then we carefully apply some denormalization techniques so that our application will return results before the werewolves have overrun the city.

Database Normalization Vs De-normalization

NORMALIZATIONDE-NORMALIZATION
Database Normalization is the process of dividing the data into multiple tables so that data redundancy and data integrities are achieved.Database De-Normalization is the opposite process of normalization where the data from multiple tables are combined into one table so that data retrieval will be faster.
It removes data redundancy i.e. it eliminates any duplicate data from the same table and puts it into a separate new table.It creates data redundancy i.e.; duplicate data may be found in the same table.
It maintains data integrity i.e.any addition or deletion of data from the table will not create any mismatch in the relationship of the tables.It may not retain the data integrity.
It increases the number of tables in the database and hence the joins to get the result.It reduces the number of tables and hence reduces the number of joins. Hence the performance of the query is faster here compared to normalized tables.
Even though it creates multiple tables, inserts, updates and deletes are more efficient in this case. If we have to insert/update/delete any data, we have to perform the transaction in that particular table. Hence there is no fear of data loss or data integrity.In this case, all the duplicate data are at a single table and care should be taken to insert/delete/update all the related data in that table. Failing to do so will create data integrity issues.
Use normalized tables where more number of insert/update/delete operations are performed and joins of those tables are not expensive.Use de-normalization where joins are expensive and frequent query is executed on the tables.

Summary

  • Database Normalization helps produce database systems that are cost-effective and have better security models.
  • Functional dependencies are a very important component of the normalize data process
  • Most database systems are normalized database up to the third normal forms.
  • A primary key uniquely identifies are record in a Table and cannot be null
  • A foreign key helps connect table and references a primary key

COBOL Blog: Click Here. IBM Reference: Click Here

Scroll to Top