I have some transitive dependencies in my database design. I have been told by my superiors that these can cause bugs. I am finding it difficult to find resources that will tell me how having these dependencies will cause bugs. What kind of problems will they cause? I am not disputing the fact, just eager to learn what kind of problems they can cause. Edit for more details: From wikipedia : Transitive dependency A transitive dependency is an indirect functional dependency, one in which X→Z only by virtue of X→Y and Y→Z.

I'll explain by an example: <pre class="prettyprint"><code>------------------------------------------------------------------- | Course | Field | Instructor | Instructor Phone | ------------------------------------------------------------------- | English | Languages | John Doe | 0123456789 | | French | Languages | John Doe | 0123456789 | | Drawing | Art | Alan Smith | 9856321158 | | PHP | Programming | Camella Ford | 2225558887 | | C++ | Programming | Camella Ford | 2225558887 | ------------------------------------------------------------------- </code></pre> <ul> <li>If you have a <code>Course</code> you can easily get its <code>Instructor</code> so <code>Course->Instructor</code>.</li> <li>If you have an <code>Instructor</code> you can't get his <code>Course</code> as he might be teaching different courses.</li> <li>If you have an <code>Instructor</code> you can easily get his <code>Phone</code> so <code>Instructor->Phone</code>.</li> </ul> That means the if you have a <code>Course</code> then you can get the <code>Instructor Phone</code> which means <code>Course->Instructor Phone</code> (i.e. Transitive dependency) Now for the problems: <ol> <li>If you delete both the <code>French</code> and <code>English</code> courses then you will delete their instructor <code>John Doe</code> as well and his phone number will be lost forever.</li> <li>There is no way to add a new <code>Instructor</code> to your database unless you add a <code>Course</code> for him first, or you can duplicate the data in an <code>Instructors table</code> which is even worse.</li> <li>If Instructor <code>John Doe</code> changes his phone number then you will have to update all Courses that he teaches with the new info which can be very prone to mistakes.</li> <li>You can't delete an Instructor from your database unless you delete all the courses he teaches or set all his fields to null.</li> <li>What if you decide to keep the birth date of your instructors? You will have to add a <code>Birth Date</code> field to the <code>Courses</code> table. Does this even sound logical? Why keep an instructor information in the courses table in the first place?</li> </ol>

What is wrong with a transitive dependency?

Tags:

sql

database

transitive-dependency

database-design

I have some transitive dependencies in my database design. I have been told by my superiors that these can cause bugs. I am finding it difficult to find resources that will tell me how having these dependencies will cause bugs. What kind of problems will they cause?

I am not disputing the fact, just eager to learn what kind of problems they can cause.

Edit for more details:

From wikipedia :

Transitive dependency
A transitive dependency is an indirect functional dependency, one in which X→Z only by virtue of X→Y and Y→Z.

336

asked Mar 30 '12 21:03

Simon Kiely

2 Answers

I'll explain by an example:

------------------------------------------------------------------- |  Course  |    Field     |   Instructor   |  Instructor Phone    | ------------------------------------------------------------------- |  English |  Languages   |  John Doe      |     0123456789       | |  French  |  Languages   |  John Doe      |     0123456789       | |  Drawing |  Art         |  Alan Smith    |     9856321158       | |  PHP     |  Programming |  Camella Ford  |     2225558887       | |  C++     |  Programming |  Camella Ford  |     2225558887       | -------------------------------------------------------------------

If you have a Course you can easily get its Instructor so Course->Instructor.
If you have an Instructor you can't get his Course as he might be teaching different courses.
If you have an Instructor you can easily get his Phone so Instructor->Phone.

That means the if you have a Course then you can get the Instructor Phone which means Course->Instructor Phone (i.e. Transitive dependency)

Now for the problems:

If you delete both the French and English courses then you will delete their instructor John Doe as well and his phone number will be lost forever.
There is no way to add a new Instructor to your database unless you add a Course for him first, or you can duplicate the data in an Instructors table which is even worse.
If Instructor John Doe changes his phone number then you will have to update all Courses that he teaches with the new info which can be very prone to mistakes.
You can't delete an Instructor from your database unless you delete all the courses he teaches or set all his fields to null.
What if you decide to keep the birth date of your instructors? You will have to add a Birth Date field to the Courses table. Does this even sound logical? Why keep an instructor information in the courses table in the first place?

165

answered Oct 03 '22 20:10

Songo

One way to express the 3NF is:

All attributes should depend on the key, whole key and nothing but the key.

The transitive dependency X->Y->Z violates that principle, leading to data redundancy and potential modification anomalies.

Let us break this down:

By definition, for a functional dependency X->Y->Z to also be transitive, the X<-Y must not hold.
If Y was a key, the X<-Y would hold, so Y cannot be a key. (FOOTNOTE1)
Since Y is not a key, any given Y can be repeated in multiple rows.
The Y->Z implies that all rows holding the same Y must also hold the same Z. (FOOTNOTE2)
Repeating the same (Y, Z) tuple in several rows does not contribute any useful information to the system. It is redundant.

In short, since Y is not a key and Y->Z, we have violated the 3NF.

Redundancies lead to modification anomalies (e.g. updating some but not all of the Zs "connected" to the same Y essentially corrupts the data, since you no longer know which copy is correct). This is typically resolved by splitting the original table into two tables, one containing {X, Y} and the other other containing {Y, Z}, This way, Y can be a key in the second table and Z is not repeated.

On the other hand, if the X<-Y does hold (i.e. X->Y->Z is not transitive), then we can retain a single table, where both X and Y are keys. Z won't be unnecessarily repeated in this scenario.

(FOOTNOTE1) A key is a (minimal) set of attributes that functionally determine all of the attributes in a relation. Rationale: If K is a key, there cannot be multiple rows with the same value of K, so any given value of K is always associated to precisely one value of every other attribute (assuming 1NF). By definition (see FOOTNOTE2), "being associated to precisely one" is the same thing as "being in a functional dependency".

(FOOTNOTE2) By definition, Y->Z if, and only if, each Y value is associated with precisely one Z value.

Example:

Assuming each message has exactly one author and each author has exactly one primary e-mail, attempting to represent messages and users in the same table would lead to repeating e-mails:

MESSAGE                         USER    EMAIL -------                         ----    ----- Hello.                          Jon     [email protected] Hi, how are you?                Rob     [email protected] Doing fine, thanks for asking.  Jon     [email protected]

(In reality, these would be MESSAGE_IDs, but let us keep things simple here.)

Now, what happens if Jon decides to change his e-mail to, say, "[email protected]"? We would need to update both of Jon's rows. If we only update one, then we have the following situation...

MESSAGE                         USER    EMAIL -------                         ----    ----- Hello.                          Jon     [email protected] Hi, how are you?                Rob     [email protected] Doing fine, thanks for asking.  Jon     [email protected]

...and we no longer know which one of the Jon's e-mails is correct. We have essentially lost the data!

The situation is especially bad since there is no declarative constraint we could use to coerce the DBMS into enforcing both updates for us. The client code will have bugs and is probably written without much regard for complex interactions that can happen in the concurrent environment.

However, if you split the table...

MESSAGE                         USER -------                         ---- Hello.                          Jon  Hi, how are you?                Rob  Doing fine, thanks for asking.  Jon   USER    EMAIL ----    ----- Jon     [email protected] Rob     [email protected]

...there is now only one row that knows about Jon's e-mail, so ambiguity is impossible.

BTW, all this can be viewed as just another expression of the DRY principle.

answered Oct 03 '22 19:10

Branko Dimitrijevic

Related questions
                            
                                Oracle SQL query for Date format
                            
                                left join turns into inner join
                            
                                sql searching multiple words in a string
                            
                                How to restore a SQL Server 2012 database to SQL Server 2008 R2?
                            
                                Efficient way of getting @@rowcount from a query using row_number
                            
                                How to use executeReader() method to retrieve the value of just one cell
                            
                                ADD time 23:59:59.999 to end date for between
                            
                                Alternative to using LIMIT keyword in a SubQuery in MYSQL
                            
                                PostgreSQL constraint - only one row can have flag set
                            
                                SQL Server features/commands that most developers are unaware of [duplicate]
                            
                                Local sequence cannot be used in LINQ to SQL implementation of query operators except the Contains() operator
                            
                                Show all duplicated rows
                            
                                TSQL - Is it possible to define the sort order?
                            
                                Fastest way to retrieve data from database
                            
                                Relational Data Model for Double-Entry Accounting
                            
                                Loop through all the rows of a temp table and call a stored procedure for each row
                            
                                Safest way to get last record ID from a table
                            
                                convert Excel Date Serial Number to Regular Date
                            
                                COALESCE with Hive SQL
                            
                                Selecting/casting output as integer in SQL

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With