Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does adding "ID" field to database tables according to the third normal form considered a mistake?

Is it OK to add "ID" field as primary key to all my database tables also to use it to make relations ships between tables? Is this design would consider as 3NF (third normal form) design? If yes, is this thing recommended theoretically or not?

like image 214
Amer Avatar asked Nov 07 '10 22:11

Amer


1 Answers

The problem is the question is a bit isolated. But since you are concerned about whether the issue has a theoretical (and possibly a standards-compliance) basis, the answer has to not be so isolated.

If yes, is this thing recommended theoretically or not?

No. It has no academic or theoretical basis whatsoever. It breaks basic Relational Database design rules, and therefore (a) will not produce a Relational database and (b) whatever is produced, will not have the power of relational capability that the users (without having to go through the app) will expect via the many simple Relational database report tools out there.

In fact, it is the unfortunately very common, knee-jerk, quick-and-dirty method of making spreadsheets (which the app developer has identified for their app) fit into a "database" container such as MS SQL. Without doing any of the genuine database design or modelling work that is required for the contents of the container to qualify as a Relational database. Good for getting a prototype or proof of concept going but not ready for any form of development (SQL coding).

Is it OK to add "ID" field as primary key to all my database tables also to use it to make relationships between tables?

Hang on. They cannot be "database tables", by definition. Database tables are arrived at by a formal modelling process, and as a result will have strong Identifiers. And Relations already defined. In which case the question would not be asked. Therefore, since it is being asked, the things you are asking about are nowhere near "database tables". They are just one app developers scratchpad for one app.

Adding an FK constraint to one spreadsheet, to "relate" it to another spreadsheet, and adding an "ID" PK, does not make a "relational" "database". No, it merely uses the capability of SQL to relate otherwise unrelated spreadsheets in the container. They remain unrelated spreadsheets, "linked" by an added "ID" column.

The result is, substantial duplication of the data; update anomalies; many more indices; larger "relational" data sets; poor performance; massive over-use of temp tables; complex SQL, all of which can be avoided by genuine database design.

Is this design would consider as 3NF (third normal form) design?

Normalisation is part of (not all of) the database design process. 3NF is arrived at through that process. 3NF or whatever NF, is not a label that can be placed on the set of spreadsheets or partially designed contents of the container, without going through the process, and thus earning the badge. One does not "consider" a bunch of spreadsheets or partly designed contents 3NF; one evaluates if the rules of Normalisation have been followed, and if the rules are not violated, then it is fairly labelled 3NF. Since the Normalisation process has not been followed, there is no basis to suggest that it could possibly relate to any Normal Form.

Likewise, over and above Normalisation, if Relational Database rules have been followed during the process, and not violated, one achieves compliance with Relational Database standards. Since Relational Database methodology has not been followed, there is no basis to suggest that it could possibly relate to any Relational Database Standard, or that any Relational capability can be expected from it.

Understanding the Whole Issue

"IDs" are surrogate keys. Surrogate keys are always (you are right) an additional key and Index, additional to the pre-existing PK, which is about to be usurped. Of course, that has a considerable performance cost, on every access.

Some questioners have the idea that Surrogate keys can be used in substitution of the PK. Which of course, is false, and you realise that, so gratefully that does not have to be addressed here.

The notion of "all surrogate keys" or "no surrogate keys" is the kind of black-or-white, all-or-nothing, nonsense that is normal for children but unacceptable in full grown adults, specially anyone doing IT work, which requires precision and understanding. It is quie normal for a small child to think "if daddy doesn't let me do what I want, he doesn't love me", and therefore "if he doesn't love me, he hates me". Most of us realise that life is a tiny bit more complex than that by the age of primary school. Developers who "like" to see" IDs" on every table and "dislike" the lack of them on some tables, are by definition incapable of considering the database as a whole, and the needs of other developers and users; they are only thinking about simplistic, one-table-at-a-time code.

It is also not about shades of grey, or blurred definitions. No, the definitions have not changed in 30 years (they have been extended and made more precise, but they have not changed). Shades of grey allows developers to avoid compliance and standards. So that too, is not recommended.

What is a genuine Relational Database ?

The truth is, if a database was honestly modelled and designed, by a qualified data modeller, using the methodologies that have been available for 30 years, they would end up with a genuinely Relational database. And if they did not follow the process, it would neither Relational nor a Database. The Identifiers and Relations would already be defined, and the meaning, the context, of that would be carried through to various tables. The data would be Normalised, to 3NF or BCNF or 5NF, and there would be no update Anomalies. In the last step, as part of the formal process, and not outside it, when translating the logical to the physical, the modeller may improve the performance of some Identifiers by adding Surrogate keys, and avoiding carrying large (wide) keys into the related child tables (1). That proves again, from another approach, why the notion of zero surrogates, or all surrogates, is childish and completely removed from the genuine process.

The genuine Relational database will have full Relational capability, honest achievement of 3NF, use natural Relational keys, with a definite few being thoughtfully switched to Surrogates.

Easily Proved

Of course, everything I have stated can be proved easily: simply post DDL of 5 to 10 of your spreadsheets, I need a "depth" of at least four (great.grand.parent⇢grand.parent⇢parent⇢child).

You may be interested, I have recently posted information re your question, in a related question, which I am not repeating here.

Note

  1. This is only required because the current SQL offerings do not support the full Relational Model, and to eliminate known performance obstacles that they have. And it will not be necessary if and when suppliers provide Relational databases in which wide Relational keys perform as well as narrow ones.

  2. I agree with Erwin's statements re keys and Identifiers, and thus I have not repeated them in my answer.

like image 93
PerformanceDBA Avatar answered Sep 19 '22 10:09

PerformanceDBA