I am completing normalization exercises from the web to test my abilities to normalize data. This particular problem was found at: https://cs.senecac.on.ca/~dbs201/pages/Normalization_Practice.htm (Exercise 1)
The table this problem is based of is as follows:
The unnormalized table that can be created from this table is:
To comply with First Normal form, I have to get rid of repeating fields in the table by moving visitdate, procedure_no, and procedure_name to their own respective tables:
This also complies with 2NF and 3NF which makes me question whether I have performed the process of normalization correctly. Please provide feedback if I did not properly move from UNF to 1NF.
Normalization is useful when your data has varying scales and the algorithm you are using does not make assumptions about the distribution of your data, such as k-nearest neighbors and artificial neural networks. Standardization assumes that your data has a Gaussian (bell curve) distribution.
Put simply, data normalization ensures that your data looks, reads, and can be utilized the same way across all of the records in your customer database. This is done by standardizing the formats of specific fields and records within your customer database.
Normal forms Boyce defined the Boyce–Codd normal form (BCNF) in 1974. Informally, a relational database relation is often described as "normalized" if it meets third normal form. Most 3NF relations are free of insertion, updation, and deletion anomalies.
In a first step you could create the following tables (assuming pet_id
is unique in the table):
Pets: pet_id, pet_name, pet_type, pet_age, owner
Visits: pet_id, visit_date, procedure
Going further you could split procedure
since the description is repeating:
Pets: pet_id, pet_name, pet_type, pet_age, owner
Visits: pet_id, visit_date, procedure_id
Procedures: procedure_id, description
Although there can be multiple procedures
on the same visit_date
for the same pet_id
, I see no reason to split those further: a date could (in theory) be stored in 2 bytes, and splitting that data would create more overhead (plus an extra index).
You would also want to change pet_age
to pet_birth_date
since the age changes over time.
Since this is the first exercise in your list, the above will probably be more than enough.
Going even further:
An owner
can have multiple pets, so another table could be created:
Pet_owners: owner_id, owner_name
and then only use owner_id
in the Pets
table. In a real system there would be customer_id, name, address, phone, email
, etc. - so that should always be in a separate table.
You could even do the same for pet_type
and store the id
in 1 or 2 bytes, but it all depends on the type of queries you want to do later on the data.
The question is poorly presented. Look at the last two columns. The askers do not mean that each column's types are sets. They mean that pairs of values on the same line make an element of a set. They should have had one column whose values were triplets--date, number & name. That's what they did when they used just one column (the last one) for number & name. Notice that their solution in the pdf linked to by the page you link to has a table that has all three of date, number & name.
But how are you supposed to know that the values should be paired? After all if the date column gave the set of a pet's visit dates & the procedure column gave the set of procedure number & names a pet ever had then we wouldn't be supposed to take a pair of values on the same line as an element of a set. Unfortunately you are just supposed to magically guess correctly. (A hint is that the number of dates & number-name pairs for a pet are always the same.)
The above took the blank areas in the illustration to be there to make room for the vertical display of set-valued attributes; the portrayed table has 4 rows. But maybe they are there because you are supposed to get a relation from this illustration by interpreting a blank subrow as representing the most recent non-blank subrow. Then the table wouldn't have any set-valued columns; the portrayed table has 9 rows. It happens that this interpretation disagrees with the linked answer's UNF & 1NF sections.
If they weren't going to explain the table & were just relying on your guesses it would have been clearer if they put a visit's procedure date, number & name under one column--just as they put a procedure number & name in one column. But really, they should always tell you how to read the illustration. But really, you should always ask how read an illustration. If you have any interpretation conventions from a related course/textbook then you should have put it in your question for us to know.
Unfortunately "UNF" tables are almost always similarly poorly given without any description about how they are to be interpreted. Also "1NF" has no standard meaning & there is no standard notion of "normalizing to 1NF".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With