Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to understand the 5th Normal Form?

I'm using two online sources for gaining an understanding of the 5NF, without any rigor of Math and proofs.

  1. A Simple Guide to Five Normal Forms in Relational Database Theory (by Kent. This one seems to have been reviewed and endorsed in one of his writings by none other than CJ Date himself)
  2. Fifth Normal Form (Wikipedia article)

However, I'm unable to understand either of these references!

Let's first examine Reference #1 (Kent's).

It says: "But suppose that a certain rule was in effect: if an agent sells a certain product, and he represents a company making that product, then he sells that product for that company."

and, then, goes on to break up the original table (all table names have been given by me)...

acp(agent, company, product)

-----------------------------
| AGENT | COMPANY | PRODUCT |
|-------+---------+---------|
| Smith | Ford    | car     | 
| Smith | Ford    | truck   | 
| Smith | GM      | car     | 
| Smith | GM      | truck   | 
| Jones | Ford    | car     | 
-----------------------------

... into 3 tables:

ac(agent, company)
cp(company, product)
ap(agent, product)

-------------------   ---------------------   ------------------- 
| AGENT | COMPANY |   | COMPANY | PRODUCT |   | AGENT | PRODUCT |
|-------+---------|   |---------+---------|   |-------+---------|
| Smith | Ford    |   | Ford    | car     |   | Smith | car     |
| Smith | GM      |   | Ford    | truck   |   | Smith | truck   |
| Jones | Ford    |   | GM      | car     |   | Jones | car     |
-------------------   | GM      | truck   |   -------------------
                      ---------------------

But I'm not even sure if I understand the English-language meaning of the above rule. My understanding of the above rule is that its 'then' clause is totally redundant! For,

IF an agent is selling a product

AND

IF this agent is representing a company making that product,

THEN, OBVIOUSLY, this agent is selling that product for that company.

So, where is the 'rule' in this statement? It, in fact, seems to be a non-statement to me!

Working backwards from the three tables -- ac, cp, and ap -- it seems the rule really is: "A company may make 1 or more products, an agent may represent 1 or more companies, and when representing a company he may or may not sell all its products."

But the original table acp was already capturing this rule. So, I'm not sure what is going on here with the explanation of 5NF.

Let's now examine Reference #2 (Wikipedia).

It says: Suppose, however, that the following rule applies: "A Traveling Salesman has certain Brands and certain Product Types in his repertoire. If Brand B1 and Brand B2 are in his repertoire, and Product Type P is in his repertoire, then (assuming Brand B1 and Brand B2 both make Product Type P), the Traveling Salesman must offer products of Product Type P those made by Brand B1 and those made by Brand B2."

Once again, going just by the English-language meaning of this rule and nothing else,

IF a salesman has brands B1 and B2, and product P with him,

AND

IF product P is made by both brands B1 and B2,

THEN, why on earth wouldn't he be able to offer product P of brands B1 and B2 just as he could in the original 3-column table 'sbp(salesman, brand, product)' which was serving well even before this new 'rule' came into effect?

Could somebody please clarify?

like image 296
Harry Avatar asked Aug 03 '13 05:08

Harry


People also ask

What do you mean by 5th normal form?

Fifth normal form (5NF), also known as projection–join normal form (PJ/NF), is a level of database normalization designed to remove redundancy in relational databases recording multi-valued facts by isolating semantically related multiple relationships.

What is a key feature of fifth normal form?

Fifth normal form or 5NF: 5NF is designed to reduce the redundancy in relational databases. To avoid redundancy all the tables are broken into as many tables as possible in 5NF. A table is in 5NF when every non-trivial join dependency is implied by the candidate key of that relation.

What do you understand by fifth normal forms explain with some suitable example?

The 5NF (Fifth Normal Form) is also known as project-join normal form. A relation is in Fifth Normal Form (5NF), if it is in 4NF, and won't have lossless decomposition into smaller tables. You can also consider that a relation is in 5NF, if the candidate key implies every join dependency in it.

Why do we use it into 5th normal form?

Fifth normal form (5NF) A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be lossless. 5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid redundancy. 5NF is also known as Project-join normal form (PJ/NF).


2 Answers

See, it is much easier to understand the thing backwards.

First the 5NF; a table (relational variable) is in the 5NF if decomposing it would not remove any redundancies. So, it is final NF as far as removing redundancy is concerned.

The original table obviously has some redundancy. It claims that "Smith represents Ford." twice, and "Smith represents GM." twice.

So let's see is it possible to decompose this into two or more projections and reduce some redundancy.

Let's start backwards.

  • Company exists. {COMPANY}

  • Agent exists. {AGENT}

  • Product exists. {PRODUCT}

  • Company makes Product. {COMPANY, PRODUCT}

  • Agent represents Company. {AGENT, COMPANY}

A pause here; suppose a rule was "If an agent represents a company, and the company makes a product, then the agent sells that product".

This would be simply {AGENT, COMPANY} JOIN {COMPANY, PRODUCT} ; but this would generate an extra tuple, namely (Jones, Ford, truck); which is not true because Jones does not sell trucks.

So, not every agent sells every product, hence it is necessary to state that explicitly.

  • Agent sells Product. {AGENT, PRODUCT}

Now if we join

{AGENT, COMPANY} JOIN {COMPANY, PRODUCT} JOIN {AGENT, PRODUCT}

that extra tuple is eliminated by the join to the {AGENT, PRODUCT}.

To grasp things intuitively, the rule can be modified a bit.

Original

If an agent sells a certain product, and he represents a company making that product, then he sells that product for that company.

Modified (same meaning)

If an agent sells product, and agent represents company, and the company makes that product, then agent sells that product for that company.

Explained (substitute from bullet points above)

If {AGENT, PRODUCT} and {AGENT, COMPANY} and {COMPANY, PRODUCT} then {AGENT, COMPANY, PRODUCT}.

So, the rule allows for the join to happen -- and hence the decomposition.

Now compare that to the predicate of the original table:

Agent represents a Company and sells some Product that the company makes.

Not the same as the rule, so it is open to anomalies which would violate the rule -- see Bill Karwin's example.


EDIT (see comments below)

Suppose that we have the original table, but not the rule.

It is obvious that there is some redundancy in the table, so we may wonder if there is a way to remove that redundancy somehow -- usual way is decomposition into projections of the table.

So, after some tinkering, we figure out that it can be decomposed into {AGENT, PRODUCT}, {AGENT, COMPANY}, {COMPANY, PRODUCT}. Current data certainly allows for that -- as per your example.

And we do that, and whenever interested in "Which agent sells which product from which company?" the answer is simply

{AGENT, COMPANY} JOIN {COMPANY, PRODUCT} JOIN {AGENT, PRODUCT}

Then Honda shows up, and they make cars and trucks too. Well, no problem there, just insert (Honda, truck) , (Honda, car) into {COMPANY, PRODUCT}.

Then Smith decides to sell Honda cars, but not trucks. Sorry, no way, oops! Because he already sells cars and trucks, if he wants to represent Honda, he has to sell both.

Because we would have tuples

(Smith, Honda) (Honda, truck) (Smith, truck)
               (Honda, car)   (Smith, car)

So we have introduced the rule! Really did not want to -- was just trying to get rid of some redundancy.

The question is now, was the original dataset just a fluke, or was it a result of a rule which was enforced somehow outside of the DB?

The author (Kent) claims that the rule exists and the design does not match it. Certainly, it would not be a problem for the original table to accept (Smith, Honda, car) only -- not requiring (Smith, Honda, truck).


Theoretical point (ignore if boring)

The rule

If {AGENT, PRODUCT} and {AGENT, COMPANY} and {COMPANY, PRODUCT} then {AGENT, COMPANY, PRODUCT}; for every (Agent, Company, Product) triplet.

explicitly states that join dependency

* { {AGENT, COMPANY}, {COMPANY, PRODUCT}, {AGENT, PRODUCT} }

holds for the original table.


As often stated, cases like this are rare; actually so rare that even textbook examples have to introduce weird rules in order to explain the basic idea.


EDIT II (the fun part, but may help understanding)

Suppose that the rule does not exist, and there is explicit requirement that any agent can sell what ever he wants from any company -- hence the rule would be plain wrong.

In that case we have the original table

{AGENT, COMPANY, PRODUCT}

I would argue that:

  1. Being all-key, it is in BCNF.

  2. It can not be decomposed (current data may allow it, but future does not).

  3. It is in BCNF, all key, it can not be decomposed, hence it is in 5NF.

  4. It is in 5NF and is all-key, hence it is in 6NF.

So, it is the presence or non-existence of the rule that determines if the table is in BCNF or 6NF -- same table same data.

like image 190
Damir Sudarevic Avatar answered Sep 19 '22 07:09

Damir Sudarevic


All normal forms are meant to avoid anomalies, i.e. logical inconsistencies in data.

There's an anomaly possible when you violate 5th normal form, represented by this relation:

-----------------------------
| AGENT | COMPANY | PRODUCT |
|-------+---------+---------|
| Smith | Ford    | car     | 
| Smith | Ford    | truck   | 
| Smith | GM      | car     | 
| Jones | Ford    | car     | 
| Jones | GM      | truck   | 
-----------------------------

So we know Jones works for GM and Ford, and we know that Jones sells cars and trucks. And we know (from Smith) that GM makes cars. So why isn't there a row for [Jones, GM, car]? That's an anomaly. Jones should sell GM cars, but there's nothing in this table that makes that consistent.

The problems comes from trying to use one relation to represent multiple independent facts.
If we instead represent these independent facts as independent relations ac, cp, and ap, then we remove the possibility of anomalies.


Re your comment:

For purposes of this example, we assume that a salesman is motivated to sell anything he can. If he can sell a type of vehicle, and he works for a company, and the company makes that type of vehicle, then the salesman will definitely sell it.

This premise is stated in William Kent's article:

But suppose that a certain rule was in effect: if an agent sells a certain product, and he represents a company making that product, then he sells that product for that company.

So based on this premise, it's implicit that every possible valid combination should result in a row in the three-column table. That's a business rule that we'd like the data to satisfy.

But in cases that our single table doesn't contain one of the rows necessary to be consistent with that premise, it fails to represent the business rule. Basically, because it introduces the possibility that a "fact" is stored redundantly.

By separating the facts into three tables, each fact is stored exactly once. The result of a JOIN between the three simpler tables naturally produces a relation that is like the original three-column table, except guaranteed to have no anomalies.

like image 24
Bill Karwin Avatar answered Sep 19 '22 07:09

Bill Karwin