Understanding Database Normalization

Introduction

Data in the Database is stored in terms of enormous quantities. Retrieving certain data will be a tedious task if the data is not organized correctly. With the help of Normalization, we can organize this data and also reduce the redundant data.

Before jumping to the main topic let's go through the evolution of DBMS first. Earlier, File System management was used where data was stored in terms of files. File system management without data redundancy, data inconsistency, user concurrency, security, and protocols for searching data from files.

DBMS software came into the light where structured data is stored in terms of tables & it provides a systematic way for creating and managing databases. Now, it has become very easy for users to store, process & analyze their data with the help of DBMS.

In the database, data is stored in the form of tables which are also known as Relations, so we call it RDBMS, where R stands for Relational.RDBMS is the basis for SQL, and for all modern database systems like MySQL.

What is Database normalization?

Database Normalisation is a systematic technique of organizing the data in the database for eliminating issues like Data Redundancy.
Normalization split a large table into smaller tables and defines relationships between them to increase the clarity in organizing data.
The stages of an organization are called normal forms.
Simply put, it is the process of developing clean data. This includes eliminating redundant and unstructured data and making the data appear similar across all records and fields.

Why do we need Normalization?

It is the process of reducing the redundancy of data in the table and also improving the data integrity. So why is this required? without Normalization, we may face many issues such as:

Insertion anomaly: It occurs when we cannot insert data into the table without the presence of another attribute.

Update anomaly: It is a data inconsistency that results from data redundancy and a partial update of data.

Deletion Anomaly: It occurs when certain attributes are lost because of the deletion of other attributes.

What are Keys ?

Before moving on to the different forms of data normalization, let's first understand the concept of keys. A key can be a single column or a combination of columns that uniquely identify the rows (or tuples) in the table. It also helps to identify duplicate information and establish relationships between different tables.

Here are the most common type of keys:

Primary key - A single column used to uniquely identify a table
Composite key - A set of columns used to uniquely identify the rows in a table
Foreign key - A key that references the primary key of another table

Types of Normalization Techniques

The database community has developed a series of guidelines for ensuring that databases are normalized. These are referred to as normal forms and are numbered from one (the lowest form of normalization, referred to as the first normal form or 1NF) through five (the fifth normal form or 5NF). In practical applications, you'll often see 1NF, 2NF, and 3NF, along with the occasional 4NF. The fifth normal form is very rarely seen.

1NF(First Normal Form)

The most basic form of data normalization is 1NF which ensures there are no two same entries in a group. For a table to be in the first normal form, it should satisfy the following rules:

Each cell should contain a single value
Each record should be unique

Example:

2NF(Second Normal Form)

In a 2NF table, all the subsets of data that can be placed in multiple rows are placed in separate tables. For a table to be in the second normal form, it should satisfy the following rules:

It should be in 1F
The primary key should not be functionally dependent on any subset of candidate key

Example

A product table:

Screenshot from 2022-08-31 21-16-19.png

Product table following 2NF

Products category table:

Brands Category Table: Screenshot from 2022-08-31 21-17-12.png

Products and brand table: Screenshot from 2022-08-31 21-17-33.png

3NF (Third Normal Form)

For a table to be in the third normal form, it should satisfy the following rules:

It should be in 2F
It should not have any transitive functional dependencies*

*A transitive functional dependency is when a change in a column (which is not a primary key) may cause any of the other columns to change.

BCNF - Boyce and Codd Normal Form

Boyce and Codd Normal Form is a higher version of 3NF and is also known as 3.5NF. A BCNF is a 3NF table that does not have multiple overlapping candidate keys. For a table to be in BCNF, it should satisfy the following rules:

It should be in 3F
For each functional dependency ( X → Y ), X should be a super key Benefits

4NF (Fourth Normal Form)

If no database table instance contains two or more, independent and multivalued data describing the relevant entity, then it is in the 4th Normal Form.

5NF (Fifth Normal Form)

A table is in the 5th Normal Form only if it is in 4NF and it cannot be decomposed into any number of smaller tables without the loss of data.

Conclusion

Well, this is a highly simplified explanation for Database Normalization. One can study this process extensively, though. After working with databases for some time, you'll automatically create Normalized databases, as it's logical and practical. Suggestions for any betterment of this article are highly welcome. Do let me know your reviews about this blog in the comments.