Normalization of Tables: From BCNF to 4NF
BCNF (Boyce-Codd Normal Form) is a higher level of database normalization than 3NF (Third Normal Form) and aims to further reduce data redundancy and improve data integrity. However, there may still be some cases where BCNF is not sufficient for eliminating all forms of data redundancy, and that is where 4NF (Fourth Normal Form) comes into play.
To understand the concept of 4NF, let's first review the definition of BCNF. A table is in BCNF if every determinant is a candidate key. A determinant is an attribute or a set of attributes that determines another attribute in the table. For example, in a table that stores information about employees and their departments, the department ID is a determinant because it determines the department name, location, and other attributes.
Now, let's consider an example of a table that is in BCNF but still has some redundancy issues:
Employee_Project
----------------
EmployeeID
ProjectID
ProjectName
EmployeeName
HoursWorked
In this table, the primary key is a combination of EmployeeID
and ProjectID
, and every determinant is a candidate key. However, there is still some redundancy in the table because the ProjectName
attribute is dependent on the ProjectID
determinant, which means that the same ProjectName
may be repeated multiple times for different ProjectID
values.
To further normalize this table, we need to split it into two tables, one for employee-project relationships and another for project information:
Employee_Project
----------------
EmployeeID
ProjectID
HoursWorked
Project
-------
ProjectID (primary key)
ProjectName
Now, the ProjectName
attribute is not repeated anymore and is stored in a separate table, which eliminates the redundancy issue.
However, there may still be cases where there are multiple independent multi-valued facts in a table that cannot be addressed by BCNF or 3NF. For example, consider a table that stores information about a company's employees, projects, and the tasks each employee performs on each project:
Employee_Project_Task
----------------------
EmployeeID
ProjectID
TaskID
EmployeeName
ProjectName
TaskName
HoursWorked
In this table, the primary key is a combination of EmployeeID
, ProjectID
, and TaskID
. Every determinant is a candidate key, but there are still some issues with redundancy. Specifically, the table contains multiple independent multi-valued facts, such as EmployeeName
and ProjectName
that are not dependent on the primary key and can have multiple values associated with each combination of EmployeeID
, ProjectID
, and TaskID
.
To address this issue, we need to further normalize the table to 4NF. To do so, we can split it into three tables, one for employee information, another for project information, and a third for task information:
Employee
--------
EmployeeID (primary key)
EmployeeName
Project
-------
ProjectID (primary key)
ProjectName
Task
----
TaskID (primary key)
TaskName
Employee_Project_Task
----------------------
EmployeeID (foreign key)
ProjectID (foreign key)
TaskID (foreign key)
HoursWorked
Now, each table contains only independent single-valued facts, and the Employee_Project_Task
table contains only the primary keys of the Employee
, Project
, and Task
tables, which eliminates all forms of data redundancy.
To summarize, 4NF is a higher level of database normalization than BCNF, and it aims to eliminate all forms of data redundancy that cannot be addressed by BCNF or 3NF. In cases where there are multiple independent multi-valued facts in a table