Suppose I have two tables on a database, T10 and T11, having 10 and 11 columns, respectively, where 10 of the columns are exactly the same on both.
What (if any) normalization rule am I violating?
Edit: I have been informed that no Normal Forms are violated here in theory. Since this was the accepted answer, I'm leaving it here for reference, and because thinking about 3NF may in practice help avoid situations like that in the question.
You are violating the Third Normal Form (3NF), because if mostly the same data is held in both tables, then every attribute of each table is not directly dependent on the key of its respective table.
Believe it or not, duplicating columns across tables violates no theoretical normal form in and of itself. Except for domain/key normal form (DKNF), normal forms are defined in terms of individual, not multiple, tables. DKNF is defined in terms of constraints, of which there are none in the general case. Thus, if there's a violation of a normal form:
Consider the normal forms, using the brief definitions from the Wikipedia article:
The table faithfully represents a relation and has no repeating groups.
This one's fairly straight forward. The term "repeating groups" has multiple meanings in the theory, but none of them have anything to do with duplicate columns or data.
No non-prime attribute in the table is functionally dependent on a proper subset of any candidate key.
Here, the important term to examine is "functional dependence". Essentially, a functional dependence is where you project a relation to two columns, X and Y, and wind up with a function X → Y. You can't have a functional dependency between two (or more) tables*. Additionally, candidate keys can't span multiple tables.
Every non-prime attribute is non-transitively dependent on every candidate key in the table.
Transitive dependence is defined in terms of functional dependence: a transitive dependency is a dependency where X → Z only because X → Y and Y → Z. X, Y and Z must be in the same table because these are functional dependencies.
Every non-trivial multivalued dependency in the table is a dependency on a superkey.
Multivalued dependency is a little trickier, but it can be illustrated with an example: "whenever the tuples (a,b,c) and (a,d,e) exist in r, the tuples (a,b,e) and (a,d,c) should also exist in r" (where "r" is a table). Most importantly for the matter at hand, a multivalued dependency only applies to a single table.
Every non-trivial join dependency in the table is implied by the superkeys of the table.
A table has a join dependency if it can be expressed as the natural join of other tables. These other tables, however, don't need to exist in the database. If table T11 in the example had a join dependency, it would still have one even if you removed table T10
Table features no non-trivial join dependencies at all (with reference to generalized join operator).
Same reasoning for 5NF.
Every non-trivial functional dependency in the table is either the dependency of an elementary key attribute or a dependency on a superkey.
Same reasoning for 2NF.
Every non-trivial functional dependency in the table is a dependency on a superkey.
Same reasoning for 2NF.
Every constraint on the table is a logical consequence of the table's domain constraints and key constraints.
If T11 has a constraint that depends on T10, then it's either a key constraint or a more complex constraint that still refers to T10. The latter case is not the general case mentioned in the question. In other words, while there could be specific schemata with duplicate columns that violate DKNF, it's not true in general. Furthermore, it's the constraint (not the normal form) that's defined in terms of multiple tables and the constraint (not the column duplication) that causes DKNF violation.
The purpose for normalization includes preventing anomalies. However, normalization isn't complete in that it doesn't guarantee a relational database will be completely free from anomalies. This is one instance where practice diverges from theory.
If this still doesn't convince you, consider the schema KM.'s comment hints at, where T11 represents a history (or versioned) version of T10. The primary key of T11 consists of the primary key columns held in common with T10, plus the extra column (the date/version column). That T11 has different candidate keys makes all the difference between an anomaly-prone and anomaly free, normalized design.
*Someone might think that you could use joins to create a dependency between two tables. While a join might create a table that has a dependency, the dependency exists on this table, not between the constituents of the join. In the case at hand, this again means that one of the tables would be a joined table and would suffer from the dependency itself, irrespective of the other table in the database.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With