Database normalization is the process of organizing the attributes and tables of a relational database to minimize redundancy and dependency. There are several normal forms (1NF, 2NF, 3NF, BCNF, 4NF, 5NF), each with its own set of rules to achieve progressively higher levels of normalization. Let’s go through each one with examples:
-
First Normal Form (1NF):
- Rule: Eliminate repeating groups and ensure each column contains atomic values.
- Example: Consider a table
Students
with columnsStudentID
,Name
, andSubjects
whereSubjects
is a comma-separated list. To normalize, create a separate tableStudent_Subjects
with columnsStudentID
andSubject
.
-
Second Normal Form (2NF):
- Rule: Meet the requirements of 1NF and ensure all non-key attributes are fully functional dependent on the primary key.
- Example: Building on the previous example, if
Subjects
depends only onStudentID
, it should be moved to theStudent_Subjects
table.
-
Third Normal Form (3NF):
- Rule: Meet the requirements of 2NF and eliminate transitive dependencies.
- Example: If there’s a table
Books
with columnsISBN
,Title
,Author
, andAuthor_Address
, whereAuthor_Address
depends onAuthor
, moveAuthor_Address
to a new tableAuthors
with columnsAuthor
andAuthor_Address
.
-
Boyce-Codd Normal Form (BCNF):
- Rule: Every determinant must be a candidate key.
- Example: If we have a table
Orders
with columnsOrderID
,ProductID
, andQuantity
, whereProductID
determinesQuantity
, it violates BCNF. To comply, separateProductID
andQuantity
into a new table whereProductID
is the primary key.
-
Fourth Normal Form (4NF):
- Rule: Eliminate multi-valued dependencies.
- Example: Consider a table
Employees
with columnsEmployeeID
,ProjectID
, andResponsibilities
, whereResponsibilities
is multi-valued. To normalize, create a new tableEmployee_Projects
with columnsEmployeeID
,ProjectID
, andResponsibility
.
-
Fifth Normal Form (5NF):
- Rule: Decompose tables to eliminate join dependencies.
- Example: If there’s a table
Employee_Projects
with columnsEmployeeID
,ProjectID
, andResponsibility
, andResponsibility
determinesEmployeeID
, split it into two tables:Employee_Responsibilities
andProject_Responsibilities
.
In essence, normalization ensures efficient database design by reducing redundancy and dependency, leading to better data integrity and query performance. Each normal form builds upon the previous one, with the ultimate goal of eliminating data anomalies and improving overall database structure.