Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is data normalization? [duplicate]

Possible Duplicate:
What exactly does database normalization do?

Can someone please clarify data normalization? What are the different levels? When should I "de-normalize"? Can I over normalize? I have a table with millions of records, and I believe I over-normalized it, but I'm not sure.

like image 398
Moderator71 Avatar asked Oct 06 '10 15:10

Moderator71


People also ask

Does normalization reduce data duplication?

Basically, normalization is the process of efficiently organising data in a database. There are two main objectives of the normalization process: eliminate redundant data (storing the same data in more than one table) and ensure data dependencies make sense (only storing related data in a table).

What is repeating data in normalization?

"Repeating groups" are something from pre-relational databases and cannot possibly appear in a relational table (relation). They are like a named set of values that is like a field of a record but is not quite. A relational table is always in 1NF. Each column of a row has a single value of the column's type.

What is meant by data normalization?

Data normalization is the organization of data to appear similar across all records and fields. It increases the cohesion of entry types leading to cleansing, lead generation, segmentation, and higher quality data.

What are the four 4 types of database normalization?

First Normal Form (1 NF) Second Normal Form (2 NF) Third Normal Form (3 NF) Boyce Codd Normal Form or Fourth Normal Form ( BCNF or 4 NF)


2 Answers

If you have million columns you probably under-normalized it.
What normalizing means is that

every non-key attribute "must provide a fact about the key, the whole key, and nothing but the key."

If you have a column that depends on anything but the key, you should normalize your table.
see here.

Added to reply to comment:
If you have ProductID | ProductType | ProductTypeID, where ProdcutTypeID depends only on ProductType, you should make a new table for that:
ProductID | ProductTypeID and on the other table: ProductTypeID | ProductTypeName .
So to answer your question, pertaining to Product isn't accurate enough, in my example at the first case, I was pertaining to the Product as well. All columns should pertain only to ProductID (you may say you only describe product, but not describing anything else, even if it's related to product - that's accurate).
Number of rows, generally speaking isn't relevent.

like image 80
Oren A Avatar answered Sep 28 '22 07:09

Oren A


Normalization is about reducing data duplication in a relational database. The most popular level is third normal form (it's the one described by "the key, the whole key, and nothing but the key"), but there are a lot of different levels, see the Wikipedia entry for a list of the main ones. (In practice people seem to think they're doing well to achieve third normal form.) Denormalizing means accepting more data duplication, typically in exchange for better performance.

like image 25
Nathan Hughes Avatar answered Sep 28 '22 07:09

Nathan Hughes