Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Am I Properly Normalizing this Data

I am completing normalization exercises from the web to test my abilities to normalize data. This particular problem was found at: https://cs.senecac.on.ca/~dbs201/pages/Normalization_Practice.htm (Exercise 1)

The table this problem is based of is as follows: Table

The unnormalized table that can be created from this table is:

enter image description here

To comply with First Normal form, I have to get rid of repeating fields in the table by moving visitdate, procedure_no, and procedure_name to their own respective tables:

enter image description here

This also complies with 2NF and 3NF which makes me question whether I have performed the process of normalization correctly. Please provide feedback if I did not properly move from UNF to 1NF.

like image 765
Zampanò Avatar asked May 24 '18 15:05

Zampanò


People also ask

How do I know if I need to normalize my data?

Normalization is useful when your data has varying scales and the algorithm you are using does not make assumptions about the distribution of your data, such as k-nearest neighbors and artificial neural networks. Standardization assumes that your data has a Gaussian (bell curve) distribution.

What does normalizing your data mean?

Put simply, data normalization ensures that your data looks, reads, and can be utilized the same way across all of the records in your customer database. This is done by standardizing the formats of specific fields and records within your customer database.

What are the 3 rules in normalizing database?

Normal forms Boyce defined the Boyce–Codd normal form (BCNF) in 1974. Informally, a relational database relation is often described as "normalized" if it meets third normal form. Most 3NF relations are free of insertion, updation, and deletion anomalies.


2 Answers

In a first step you could create the following tables (assuming pet_id is unique in the table):

Pets:   pet_id, pet_name, pet_type, pet_age, owner
Visits: pet_id, visit_date, procedure

Going further you could split procedure since the description is repeating:

Pets:       pet_id, pet_name, pet_type, pet_age, owner
Visits:     pet_id, visit_date, procedure_id
Procedures: procedure_id, description

Although there can be multiple procedures on the same visit_date for the same pet_id, I see no reason to split those further: a date could (in theory) be stored in 2 bytes, and splitting that data would create more overhead (plus an extra index).

You would also want to change pet_age to pet_birth_date since the age changes over time.

Since this is the first exercise in your list, the above will probably be more than enough.

Going even further:

An owner can have multiple pets, so another table could be created:

Pet_owners: owner_id, owner_name

and then only use owner_id in the Pets table. In a real system there would be customer_id, name, address, phone, email, etc. - so that should always be in a separate table.

You could even do the same for pet_type and store the id in 1 or 2 bytes, but it all depends on the type of queries you want to do later on the data.

like image 184
Danny_ds Avatar answered Oct 02 '22 20:10

Danny_ds


The question is poorly presented. Look at the last two columns. The askers do not mean that each column's types are sets. They mean that pairs of values on the same line make an element of a set. They should have had one column whose values were triplets--date, number & name. That's what they did when they used just one column (the last one) for number & name. Notice that their solution in the pdf linked to by the page you link to has a table that has all three of date, number & name.

But how are you supposed to know that the values should be paired? After all if the date column gave the set of a pet's visit dates & the procedure column gave the set of procedure number & names a pet ever had then we wouldn't be supposed to take a pair of values on the same line as an element of a set. Unfortunately you are just supposed to magically guess correctly. (A hint is that the number of dates & number-name pairs for a pet are always the same.)

The above took the blank areas in the illustration to be there to make room for the vertical display of set-valued attributes; the portrayed table has 4 rows. But maybe they are there because you are supposed to get a relation from this illustration by interpreting a blank subrow as representing the most recent non-blank subrow. Then the table wouldn't have any set-valued columns; the portrayed table has 9 rows. It happens that this interpretation disagrees with the linked answer's UNF & 1NF sections.

If they weren't going to explain the table & were just relying on your guesses it would have been clearer if they put a visit's procedure date, number & name under one column--just as they put a procedure number & name in one column. But really, they should always tell you how to read the illustration. But really, you should always ask how read an illustration. If you have any interpretation conventions from a related course/textbook then you should have put it in your question for us to know.

Unfortunately "UNF" tables are almost always similarly poorly given without any description about how they are to be interpreted. Also "1NF" has no standard meaning & there is no standard notion of "normalizing to 1NF".

like image 35
philipxy Avatar answered Oct 02 '22 19:10

philipxy