So I am looking over my database notes and material trying to refresh myself on the general concepts and terminology for upcoming interviews. I have gotten stuck at dependency however and lossless-join decompositions though. I have searched all over and see lots of mathy equations but I am looking for a plain and simple English response or example.
I have a found a powerpoint from http://www.cs.kent.edu/~jin/DM09Fall/lecture6.ppt which illustrates an example that I cannot fully understand. It is posted below.
R = (A, B, C)F = {A → B, B → C)
Can be decomposed in two different ways
R1 = (A, B), R2 = (B, C)
Lossless-join decomposition:
R1 ∩ R2 = {B} and B → BC
Dependency preserving
R1 = (A, B), R2 = (A, C)
Lossless-join decomposition:
R1 ∩ R2 = {A} and A → AB
Not dependency preserving (cannot check B -> C without computing R1 ⋈ R2)
So I understand that having A → B and B → C means that you have a "reference" in each other, whereas A → B and A → C means there is no reference or link between B and C.
So,
Does Lossless-join decomposition mean that the data overall is still intact? In both scenarios, you can still eventually get both data, right? If this is wrong, please correct me! :)
What is the significance of having that connection B to C in the second decomposition, and how does that make it not dependency preserving?
If A is deleted you will simply have B and C with no relations.
If B is deleted you will still have A → C.
If C is deleted you will still have A → B.
Because even in the first example you will still end up with similiar results upon deleting values.
If A is deleted you will still have a relation of B → C.
If B is deleted you will simply have A and C with no relations.
If C is deleted you will have a relation of A → B.
So either way, if you delete each item you will still have two instances of a relation and one instance of two items having no relations
My guess would be that in deleting the "middle man relation" (is there a term for that), B in example 1 and A in example 2, is that you can still relate example 1's A → B then B → C, while in example 2 you are stuck with A → B with no connection to A → C.
But as you can see I am now back to square one as to why this causes data dependency and while googling "what is data dependency" or "examples of data dependency" it is simply not making any sense to me.
If anyone could help clarify this for me it would be greatly appreciated.
Decomposition of a relation R into R1 and R2 is Lossless join decomposition if you can construct back R by joining the relation R1 and R2(form R1 ⋈ R2 you can obtain R).
For a decomposition of Relation R into R1 and R2 to be lossless, it must satisfy any of 2 condition:
1. R1 ∩ R2 -> R1
2. R1 ∩ R2 -> R2
If the above relation doesn't make any sense then think of it like this, when you are intersecting 2 relation R1 and R2 and obtaining common attributes then if the common attributes are able to determine any one of the relation then this (these) common attribute(s) is (are) candidate key(s) for the obtained relation(think why ?) and hence you can join using this candidate key the other relation to obtain R.
Regarding dependency preserving, a decomposition of relation R is dependency preserving if the Functional dependency of R can be obtained by taking the union of the functional dependency of all the decomposed relation.
Lossless Join doesn't mean that any of the tuples are lost, or cannot be recovered after a join. Lossless join means generation of spurious tuples which lead to "extra" rows (tuples) and "information" is said to be lost.
Putting it simply I think dependency is preserved when we find both sides of the FD in the attributes of one sub-schema. (someone please correct me if I'm wrong).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With