Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

(De)Normalization of two relations

People who read C.J.Date's Introduction to Database System or books of similar level should not have problems with definition of normalization and denormalization.

However, memory is not what it used to be and I find myself often looking at some design and saying that it is not normalized even though I can not find which of the normal forms it is breaking.

The actual example that illustrate it is:

If we have relations

r1 (A, B, C) and r2 (A, D)

with FDs: AB->C and A->D

and r1 represent detailed data, while r2 is summary of that data (in another words each instance of D is a function of values in r1. in this example let it be subtotal of values C according to A from r1).

Example instance

r1 = 
A  B  C  
1  1  10
1  2  20
2  1  10
2  2  25

r2 =
A  D
1  30
2  35

So, even though I can not say that it breaks for example 2NF or 3NF, I seem to be stuck on the idea that the design is still denormalised in the following sense (from Codd, E.F. "Further Normalization of the Data Base Relational Model", p. 34, commenting on the reasons to normalize beyond 1NF):

  1. To free the collection of relations from undesirable insertion, update and deletion dependencies;
  2. To reduce the need for restructuring the collection of relations as new types of data are introduced, and thus increase the life span of application programs;
  3. To make the relational model more informative to users;
  4. To make the collection of relations neutral to the query statistics, where these statistics are liable to change as time goes by.

As I can say, that if we define D as a sum of all Cs from r1 where A from r1 is equal to A from r2 then, if we update C in r1 and we don't update D in r2, we can end up with undesirable update dependency and the data ends up in inconsistent state I find this reason to call r1 and r2 denormalized and to think of them as denormalized. (In fact whole r2 is a function of r1 and bring zero new facts into the model; r2 = f(r1))

So the questions are

  1. can we call r1 and r2 denormalized?
  2. if yes, why? if not, why? (according to which rule? or according to which definition?)

NOTE:
To those who find the question(s) interesting enough to put in an answer, I kindly ask to provide either something quotable or to put it in a form of specific assumptions and conclusions (or in another words, if you are going to put in your opinion, please follow it with some reasoning).

EDIT I accepted dportas answer. I'll try to add a bit to it here: C.J.Date can makes a clear and strict distinction:

Much of design theory has to do with reducing redundancy; normalization reduces redundancy within relvars, orthogonality reduces it across relvars.

quoted from Database in depth: relational theory for practitioners

and on the next page

just as a failure to normalize all the way implies redundancy and can lead to certain anomalies, so too can a failure to adhere to orthogonality.

like image 929
Unreason Avatar asked Nov 30 '10 09:11

Unreason


2 Answers

Assuming AB is a key in r1 and A is a key in r2 then it seems that the schema is in 6NF. The Relational Database Dictionary (Date) defines denormalization as:

Replacing a set of relvars R1, R2, . . ., Rn by their join R, such that for all i the projection of R on the attributes of Ri is guaranteed to be equal to Ri (i = 1, 2, . . ., n).

Fundamentally, normalization/denormalization is about composition and nonloss decomposition using projection and join operators. In this example you have redundancy caused by a different operator: summation. I expect it would be quite possible in principle to form a theory of "normalization" for operators other than projection and join, perhaps even for non-relational functions like summation. That's not how normalization is conventionally defined however and in the absence of any sound basis for doing otherwise I think we ought to apply the technical meaning denormalization as defined by Date in the above quotation.

like image 130
nvogel Avatar answered Oct 13 '22 10:10

nvogel


Your definition for column D in r2, "a sum of all Cs from r1 where A from r1 is equal to A from r2", is a constraint on D. More formally, where Σ is summation, π is projection and σ is selection,

(a,d) ∈ r2 ⇔ (a, d) = (a, Σ c), a ∈ πA(r1), c ∈ πCA=a(r1))

Since this constraint is neither a domain constraint nor a key constraint, r2 is not in Domain/Key Normal Form (DKNF).

DKNF is the only normal form of which I'm aware that isn't defined in terms of a single relation, chiefly because it's defined in terms of constraints rather than dependencies.

like image 26
outis Avatar answered Oct 13 '22 09:10

outis