I have some transitive dependencies in my database design. I have been told by my superiors that these can cause bugs. I am finding it difficult to find resources that will tell me how having these dependencies will cause bugs. What kind of problems will they cause?
I am not disputing the fact, just eager to learn what kind of problems they can cause.
Edit for more details:
From wikipedia :
Transitive dependency
A transitive dependency is an indirect functional dependency, one in which X→Z only by virtue of X→Y and Y→Z.
Avoid transitive dependencies to help ensure normalization A transitive dependency in a database is an indirect relationship between values in the same table that causes a functional dependency. To achieve the normalization standard of Third Normal Form (3NF), you must eliminate any transitive dependency.
What is Transitive Dependency in DBMS? Whenever some indirect relationship happens to cause functional dependency (FC), it is known as Transitive Dependency. Thus, if A -> B and B -> C are true, then A -> C happens to be a transitive dependency. Thus, to achieve 3NF, one must eliminate the Transitive Dependency.
The normalization of 2NF relations to 3NF involves the removal of transitive dependencies. If a transitive dependency exists, we remove the transitively dependent attribute(s) from the relation by placing the attribute(s) in a new relation along with a copy of the determinant.
I'll explain by an example:
------------------------------------------------------------------- | Course | Field | Instructor | Instructor Phone | ------------------------------------------------------------------- | English | Languages | John Doe | 0123456789 | | French | Languages | John Doe | 0123456789 | | Drawing | Art | Alan Smith | 9856321158 | | PHP | Programming | Camella Ford | 2225558887 | | C++ | Programming | Camella Ford | 2225558887 | -------------------------------------------------------------------
Course
you can easily get its Instructor
so Course->Instructor
.Instructor
you can't get his Course
as he might be teaching different courses.Instructor
you can easily get his Phone
so Instructor->Phone
.That means the if you have a Course
then you can get the Instructor Phone
which means Course->Instructor Phone
(i.e. Transitive dependency)
Now for the problems:
French
and English
courses then you will delete their instructor John Doe
as well and his phone number will be lost forever.Instructor
to your database unless you add a Course
for him first, or you can duplicate the data in an Instructors table
which is even worse.John Doe
changes his phone number then you will have to update all Courses that he teaches with the new info which can be very prone to mistakes.Birth Date
field to the Courses
table. Does this even sound logical? Why keep an instructor information in the courses table in the first place?One way to express the 3NF is:
All attributes should depend on the key, whole key and nothing but the key.
The transitive dependency X->Y->Z violates that principle, leading to data redundancy and potential modification anomalies.
Let us break this down:
In short, since Y is not a key and Y->Z, we have violated the 3NF.
Redundancies lead to modification anomalies (e.g. updating some but not all of the Zs "connected" to the same Y essentially corrupts the data, since you no longer know which copy is correct). This is typically resolved by splitting the original table into two tables, one containing {X, Y} and the other other containing {Y, Z}, This way, Y can be a key in the second table and Z is not repeated.
On the other hand, if the X<-Y does hold (i.e. X->Y->Z is not transitive), then we can retain a single table, where both X and Y are keys. Z won't be unnecessarily repeated in this scenario.
(FOOTNOTE1) A key is a (minimal) set of attributes that functionally determine all of the attributes in a relation. Rationale: If K is a key, there cannot be multiple rows with the same value of K, so any given value of K is always associated to precisely one value of every other attribute (assuming 1NF). By definition (see FOOTNOTE2), "being associated to precisely one" is the same thing as "being in a functional dependency".
(FOOTNOTE2) By definition, Y->Z if, and only if, each Y value is associated with precisely one Z value.
Example:
Assuming each message has exactly one author and each author has exactly one primary e-mail, attempting to represent messages and users in the same table would lead to repeating e-mails:
MESSAGE USER EMAIL ------- ---- ----- Hello. Jon [email protected] Hi, how are you? Rob [email protected] Doing fine, thanks for asking. Jon [email protected]
(In reality, these would be MESSAGE_ID
s, but let us keep things simple here.)
Now, what happens if Jon decides to change his e-mail to, say, "[email protected]"? We would need to update both of Jon's rows. If we only update one, then we have the following situation...
MESSAGE USER EMAIL ------- ---- ----- Hello. Jon [email protected] Hi, how are you? Rob [email protected] Doing fine, thanks for asking. Jon [email protected]
...and we no longer know which one of the Jon's e-mails is correct. We have essentially lost the data!
The situation is especially bad since there is no declarative constraint we could use to coerce the DBMS into enforcing both updates for us. The client code will have bugs and is probably written without much regard for complex interactions that can happen in the concurrent environment.
However, if you split the table...
MESSAGE USER ------- ---- Hello. Jon Hi, how are you? Rob Doing fine, thanks for asking. Jon USER EMAIL ---- ----- Jon [email protected] Rob [email protected]
...there is now only one row that knows about Jon's e-mail, so ambiguity is impossible.
BTW, all this can be viewed as just another expression of the DRY principle.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With