Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best schema design for table relationship that enforces integrity

Given a table of models 'A', that can have multiple child models 'B', of which 'B' will have one or more child models 'C'.. this sounds simple, however I need to enforce that for each 'A', any 'B' must have a unique collection of 'C'.. for example, C cannot be a child of two 'B's that are part of the same parent 'A'.. but a 'C' can be a child of multiple 'B's given that each 'B's parent 'A' is distinct..

Does that make sense or should I unobfuscate my scenario? cheers in advance!

Note that I know this policy will be enforced in the application, but I wan't it to be impossible for the database to be in an invalid state.

Edit: hi all, fantastic feedback so firstly I must thank you all for sharing your knowledge with me.

Just to clarify the situation, I'll explain the scenario but here are some notes:

'A' has zero or more 'B', a 'B' is implicitly associated with an 'A', and as such is always a child of just one 'A'. 'C' is somewhat of a root entity that is associated with many 'B's as well as other elements in the database.


Heres the real story:

This is a website that contains many briefs (A), and many members (C), a brief can have many submissions (B), of which a submission will always have one or more associated members. The idea is that a submission can in fact be a collaboration, and each member has no more 'power' than any other, but there will be a different system in place to validate the policy of how members work together.

So, per brief, a member can only submit a single submission and a submission can have many members (collaborators).

Hope that helps, but I think you've given me plenty of help already!

Steve.

like image 990
meandmycode Avatar asked Mar 17 '09 11:03

meandmycode


People also ask

What is the most important thing when designing your database schema?

Naming conventions: Define and use appropriate naming conventions to make your database schema designs most effective. While you may decide on a particular style or else adhere to an ISO standard, the most important thing is to be consistent in your name fields.

What are the three types of database design?

Hierarchical database model. Relational model. Network model. Object-oriented database model.

How do you design a database schema example?

There are mainly two tools you'd use to create a database schema. The first is an RDBMS, or a relational database management system. Examples of RDBMSs include SQL Server, MySQL, and PostgreSQL. The second and most important tool is SQL (structured query language).


2 Answers

I think that you need SQL standard assertions, which are (unfortunately) largely unimplemented by actual DBMS.

All the answers are agreeing that there are three primary tables called TableA, TableB, and TableC, each containing its own ID column:

TableA (A_ID PRIMARY KEY, ...)
TableB (B_ID PRIMARY KEY, ...)
TableC (C_ID PRIMARY KEY, ...)

It is not clear from the description of the problem whether a single B value can have multiple A parent entries. It is clear that a single C can have multiple B parent entries. If a B is tied to a single A, the design of TableB can be revised to:

TableB (B_ID, ..., A_ID REFERENCES TableA)

If a B can be associated with several different A's, then the connection is best represented by a joining table:

A_and_B (A_ID REFERENCES TableA,
         B_ID REFERENCES TableB,
         PRIMARY KEY (A_ID, B_ID)
        )

It is also not clear from the description whether the C's associated with a B must be the same for every A that the B is associated with, or whether different A's can reference the same B, and the set of C's associated with the B for A1 can be different from the set of C's associated with the B for A2. (Of course, if a single B can only be associated with one A, this issue is moot.)

For the purposes of this answer, I'm going to assume that any B is associated with a single A, so the structure of TableB includes A_ID as a foreign key. Since a single C can be associated with multiple B's, the relevant structure is a new joining table:

B_and_C (B_ID REFERENCES TableB,
         C_ID REFERENCES TableC,
         PRIMARY KEY (B_ID, C_ID)
        )

Simplifying (by omitting rules about deferrability and immediacy) an assertion looks like:

CREATE ASSERTION assertion_name CHECK ( <search_condition> )

So, once we have a set of design decisions, we can write an assertion to validate the data. Given tables TableA, TableB (with foreign key A_ID), TableC and B_and_C, the requirement is that the number of occurrences of a given C_ID across a complete A is 1.

CREATE ASSERTION only_one_instance_of_c_per_a CHECK
(
     NOT EXISTS (
         SELECT A_ID, COUNT(C_ID)
             FROM TableB JOIN B_and_C USING (C_ID)
             GROUP BY A_ID
             HAVING COUNT(C_ID) > 1
     )
)

[Amended: I think this is more accurate:

CREATE ASSERTION only_one_instance_of_c_per_a CHECK
(
     NOT EXISTS (
         SELECT A_ID, C_ID, COUNT(*)
             FROM TableB JOIN B_and_C USING (C_ID)
             GROUP BY A_ID, C_ID
             HAVING COUNT(*) > 1
     )
)

]

The set of join conditions varies with other rules for how the tables are connected, but the overall constraint structure remains the same - there must not exist more than one reference to a given C_ID for a particular A_ID.


In the comments below, meandmycode notes:

I get the feeling that there's a flaw in my design. My real world logic is that a 'B' always has at least one child 'C'. This doesn't make sense given that 'B' must exist before its child can be attached. The database would currently allow a 'B' to be attached to an 'A' without having at least ONE 'C'.. child, I'm as such going to revise 'B' so that it has a field that refers to its primary child 'C', as well as having a child collection of additional 'C's, but now I have a collection that could also include the primary 'C' specified by the 'B', which would be.. wrong.

Is there a db pattern that would infer a 'one or more children' rule, vs zero or more?

I think you do have problems with your model. It is hard to create a B if there must already exist a C that refers to the newly created B, especially if C's must only refer to existing B's. The phrase 'chicken and egg' comes to mind. So, normally, you allow B's to have zero or more C's in a context like this.

You've still not stipulated whether TableB has an A_ID foreign key or whether you have a linking table like A_and_B. If it has a foreign key, then presumably you cannot create a B until you've created the A to which it refers.

I don't think including one C ID in table B is a good idea - it makes for asymmetric processing (harder SQL). It also means that if you need to delete that one C, you have to update things so that one of the other C references is deleted from the table it is currently in, and then update the value in the B record. That's messy, to be polite about it.

I think you need to amend your question to define the actual table structure you're looking at - along the lines shown in various answers; you can use triple dots to represent other but irrelevant columns. The assertion I suggested would probably have to be implemented as some sort of trigger - which gets into DBMS-specific notations.


From the amended description of briefs (A), submissions (B) and members (C), it is clear that a single submission applies to just one brief, so that submissions can have a simple foreign key that identifies the brief it is a submission for. And a member can only collaborate on one submission for a particular brief. There will be a table of 'submission_collaborators' with columns to identify the submission and member, the combination is the primary key and each column is a foreign key.

Briefs(Brief_ID, ...)
Submissions(Submission_ID, Brief_ID REFERENCES Briefs, ...)
Members(Member_ID, ...)
Submission_Collaborators(Submission_ID REFERENCES Submissions,
                         Member_ID REFERENCES Members,
                         PRIMARY KEY (Submission_ID, Member_ID)
                        )

Hence, the requirement is that the following query must return no rows:

SELECT s.brief_id, c.member_id, COUNT(*)
    FROM submissions AS s JOIN submission_collaborators AS c
         ON s.submission_id = c.submission_id
    GROUP BY s.brief_id, c.member_id
    HAVING COUNT(*) > 1

This is the same query that I embedded in the CREATE ASSERTION (second variant). You can dig out extra information (brief title, submission title, member name, various dates, etc) as well, but the core of the issue is that the query shown must return no data.

like image 68
Jonathan Leffler Avatar answered Sep 30 '22 17:09

Jonathan Leffler


I think I've got your relationship model captured here; If not then I vote for unobfuscation:

  • A [ {AID}, ... ]
  • B [ {BID}, AID, ... ]
  • C [ {CID}, ... ]
  • B_C_Link [ {BID, CID}, AID ]
    • Additional Unique Index on (AID, CID)

The notation uses the {} primary key indicator. So As can have multiple Bs (by putting an AID on B), Bs can have Cs (by using a many-to-many table B_C_Link), and multiple Cs cannot belong to the same A (by adding the AID to the many-to-many table and enforcing (AID, CID) uniqueness.

like image 27
Chris Shaffer Avatar answered Sep 30 '22 17:09

Chris Shaffer