Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Normalization in plain English

I understand the concept of database normalization, but always have a hard time explaining it in plain English - especially for a job interview. I have read the wikipedia post, but still find it hard to explain the concept to non-developers. "Design a database in a way not to get duplicated data" is the first thing that comes to mind.

Does anyone has a nice way to explain the concept of database normalization in plain English? And what are some nice examples to show the differences between first, second and third normal forms?

Say you go to a job interview and the person asks: Explain the concept of normalization and how would go about designing a normalized database.

What key points are the interviewers looking for?

like image 975
Yada Avatar asked Feb 25 '10 05:02

Yada


People also ask

What is normalization in simple words?

Normalization is the process of organizing data in a database. This includes creating tables and establishing relationships between those tables according to rules designed both to protect the data and to make the database more flexible by eliminating redundancy and inconsistent dependency.

What is Normalisation with example?

Normalization is a database design technique that reduces data redundancy and eliminates undesirable characteristics like Insertion, Update and Deletion Anomalies. Normalization rules divides larger tables into smaller tables and links them using relationships.

What is normalization explain all normal forms?

Normalization is the process of minimizing redundancy from a relation or set of relations. Redundancy in relation may cause insertion, deletion, and update anomalies. So, it helps to minimize the redundancy in relations. Normal forms are used to eliminate or reduce redundancy in database tables.

What is normalization and why it is needed?

Normalization is a technique for organizing data in a database. It is important that a database is normalized to minimize redundancy (duplicate data) and to ensure only related data is stored in each table. It also prevents any issues stemming from database modifications such as insertions, deletions, and updates.


2 Answers

Well, if I had to explain it to my wife it would have been something like that:

The main idea is to avoid duplication of large data.

Let's take a look at a list of people and the country they came from. Instead of holding the name of the country which can be as long as "Bosnia & Herzegovina" for every person, we simply hold a number that references a table of countries. So instead of holding 100 "Bosnia & Herzegovina"s, we hold 100 #45. Now in the future, as often happens with Balkan countries, they split to two countries: Bosnia and Herzegovina, I will have to change it only in one place. well, sort of.

Now, to explain 2NF, I would have changed the example, and let's assume that we hold the list of countries every person visited. Instead of holding a table like:

Person   CountryVisited   AnotherInformation   D.O.B. Faruz    USA              Blah Blah            1/1/2000 Faruz    Canada           Blah Blah            1/1/2000 

I would have created three tables, one table with the list of countries, one table with the list of persons and another table to connect them both. That gives me the most freedom I can get changing person's information or country information. This enables me to "remove duplicate rows" as normalization expects.

like image 137
Faruz Avatar answered Sep 26 '22 04:09

Faruz


One-to-many relationships should be represented as two separate tables connected by a foreign key. If you try to shove a logical one-to-many relationship into a single table, then you are violating normalization which leads to dangerous problems.

Say you have a database of your friends and their cats. Since a person may have more than one cat, we have a one-to-many relationship between persons and cats. This calls for two tables:

Friends Id | Name | Address ------------------------- 1  | John | The Road 1 2  | Bob  | The Belltower   Cats Id | Name   | OwnerId  --------------------- 1  | Kitty  | 1 2  | Edgar  | 2 3  | Howard | 2 

(Cats.OwnerId is a foreign key to Friends.Id)

The above design is fully normalized and conforms to all known normalization levels.

But say I had tried to represent the above information in a single table like this:

Friends and cats Id | Name | Address       | CatName ----------------------------------- 1  | John | The Road 1    | Kitty      2  | Bob  | The Belltower | Edgar   3  | Bob  | The Belltower | Howard  

(This is the kind of design I might have made if I was used to Excel-sheets but not relational databases.) A single-table approach forces me to repeat some information if I want the data to be consistent. The problem with this design is that some facts, like the information that Bob's address is "The belltower" is repeated twice, which is redundant, and makes it difficult to query and change data and (the worst) possible to introduce logical inconsistencies.

Eg. if Bob moves I have to make sure I change the address in both rows. If Bob gets another cat, I have to be sure to repeat the name and address exactly as typed in the other two rows. E.g. if I make a typo in Bob's address in one of the rows, suddenly the database has inconsistent information about where Bob lives. The un-normalized database cannot prevent the introduction of inconsistent and self-contradictory data, and hence the database is not reliable. This is clearly not acceptable.

Normalization cannot prevent you from entering wrong data. What normalization prevents is the possibility of inconsistent data.

It is important to note that normalization depends on business decisions. If you have a customer database, and you decide to only record a single address per customer, then the table design (#CustomerID, CustomerName, CustomerAddress) is fine. If however you decide that you allow each customer to register more than one address, then the same table design is not normalized, because you now have a one-to-many relationship between customer and address. Therefore you cannot just look at a database to determine if it is normalized, you have to understand the business model behind the database.

like image 43
JacquesB Avatar answered Sep 23 '22 04:09

JacquesB