Best schema design for table relationship that enforces integrity

Tags:

Given a table of models 'A', that can have multiple child models 'B', of which 'B' will have one or more child models 'C'.. this sounds simple, however I need to enforce that for each 'A', any 'B' must have a unique collection of 'C'.. for example, C cannot be a child of two 'B's that are part of the same parent 'A'.. but a 'C' can be a child of multiple 'B's given that each 'B's parent 'A' is distinct..

Does that make sense or should I unobfuscate my scenario? cheers in advance!

Note that I know this policy will be enforced in the application, but I wan't it to be impossible for the database to be in an invalid state.

Edit: hi all, fantastic feedback so firstly I must thank you all for sharing your knowledge with me.

Just to clarify the situation, I'll explain the scenario but here are some notes:

'A' has zero or more 'B', a 'B' is implicitly associated with an 'A', and as such is always a child of just one 'A'. 'C' is somewhat of a root entity that is associated with many 'B's as well as other elements in the database.

Heres the real story:

This is a website that contains many briefs (A), and many members (C), a brief can have many submissions (B), of which a submission will always have one or more associated members. The idea is that a submission can in fact be a collaboration, and each member has no more 'power' than any other, but there will be a different system in place to validate the policy of how members work together.

So, per brief, a member can only submit a single submission and a submission can have many members (collaborators).

Hope that helps, but I think you've given me plenty of help already!

Steve.

990

asked Mar 17 '09 11:03

meandmycode

2 Answers

I think that you need SQL standard assertions, which are (unfortunately) largely unimplemented by actual DBMS.

All the answers are agreeing that there are three primary tables called TableA, TableB, and TableC, each containing its own ID column:

TableA (A_ID PRIMARY KEY, ...)
TableB (B_ID PRIMARY KEY, ...)
TableC (C_ID PRIMARY KEY, ...)

It is not clear from the description of the problem whether a single B value can have multiple A parent entries. It is clear that a single C can have multiple B parent entries. If a B is tied to a single A, the design of TableB can be revised to:

TableB (B_ID, ..., A_ID REFERENCES TableA)

If a B can be associated with several different A's, then the connection is best represented by a joining table:

A_and_B (A_ID REFERENCES TableA,
         B_ID REFERENCES TableB,
         PRIMARY KEY (A_ID, B_ID)
        )

It is also not clear from the description whether the C's associated with a B must be the same for every A that the B is associated with, or whether different A's can reference the same B, and the set of C's associated with the B for A1 can be different from the set of C's associated with the B for A2. (Of course, if a single B can only be associated with one A, this issue is moot.)

For the purposes of this answer, I'm going to assume that any B is associated with a single A, so the structure of TableB includes A_ID as a foreign key. Since a single C can be associated with multiple B's, the relevant structure is a new joining table:

B_and_C (B_ID REFERENCES TableB,
         C_ID REFERENCES TableC,
         PRIMARY KEY (B_ID, C_ID)
        )

Simplifying (by omitting rules about deferrability and immediacy) an assertion looks like:

CREATE ASSERTION assertion_name CHECK ( <search_condition> )

So, once we have a set of design decisions, we can write an assertion to validate the data. Given tables TableA, TableB (with foreign key A_ID), TableC and B_and_C, the requirement is that the number of occurrences of a given C_ID across a complete A is 1.

CREATE ASSERTION only_one_instance_of_c_per_a CHECK
(
     NOT EXISTS (
         SELECT A_ID, COUNT(C_ID)
             FROM TableB JOIN B_and_C USING (C_ID)
             GROUP BY A_ID
             HAVING COUNT(C_ID) > 1
     )
)

[Amended: I think this is more accurate:

CREATE ASSERTION only_one_instance_of_c_per_a CHECK
(
     NOT EXISTS (
         SELECT A_ID, C_ID, COUNT(*)
             FROM TableB JOIN B_and_C USING (C_ID)
             GROUP BY A_ID, C_ID
             HAVING COUNT(*) > 1
     )
)

]

The set of join conditions varies with other rules for how the tables are connected, but the overall constraint structure remains the same - there must not exist more than one reference to a given C_ID for a particular A_ID.

In the comments below, meandmycode notes:

I get the feeling that there's a flaw in my design. My real world logic is that a 'B' always has at least one child 'C'. This doesn't make sense given that 'B' must exist before its child can be attached. The database would currently allow a 'B' to be attached to an 'A' without having at least ONE 'C'.. child, I'm as such going to revise 'B' so that it has a field that refers to its primary child 'C', as well as having a child collection of additional 'C's, but now I have a collection that could also include the primary 'C' specified by the 'B', which would be.. wrong.

Is there a db pattern that would infer a 'one or more children' rule, vs zero or more?

I think you do have problems with your model. It is hard to create a B if there must already exist a C that refers to the newly created B, especially if C's must only refer to existing B's. The phrase 'chicken and egg' comes to mind. So, normally, you allow B's to have zero or more C's in a context like this.

You've still not stipulated whether TableB has an A_ID foreign key or whether you have a linking table like A_and_B. If it has a foreign key, then presumably you cannot create a B until you've created the A to which it refers.

I don't think including one C ID in table B is a good idea - it makes for asymmetric processing (harder SQL). It also means that if you need to delete that one C, you have to update things so that one of the other C references is deleted from the table it is currently in, and then update the value in the B record. That's messy, to be polite about it.

I think you need to amend your question to define the actual table structure you're looking at - along the lines shown in various answers; you can use triple dots to represent other but irrelevant columns. The assertion I suggested would probably have to be implemented as some sort of trigger - which gets into DBMS-specific notations.

From the amended description of briefs (A), submissions (B) and members (C), it is clear that a single submission applies to just one brief, so that submissions can have a simple foreign key that identifies the brief it is a submission for. And a member can only collaborate on one submission for a particular brief. There will be a table of 'submission_collaborators' with columns to identify the submission and member, the combination is the primary key and each column is a foreign key.

Briefs(Brief_ID, ...)
Submissions(Submission_ID, Brief_ID REFERENCES Briefs, ...)
Members(Member_ID, ...)
Submission_Collaborators(Submission_ID REFERENCES Submissions,
                         Member_ID REFERENCES Members,
                         PRIMARY KEY (Submission_ID, Member_ID)
                        )

Hence, the requirement is that the following query must return no rows:

SELECT s.brief_id, c.member_id, COUNT(*)
    FROM submissions AS s JOIN submission_collaborators AS c
         ON s.submission_id = c.submission_id
    GROUP BY s.brief_id, c.member_id
    HAVING COUNT(*) > 1

This is the same query that I embedded in the CREATE ASSERTION (second variant). You can dig out extra information (brief title, submission title, member name, various dates, etc) as well, but the core of the issue is that the query shown must return no data.

answered Sep 30 '22 17:09

Jonathan Leffler

I think I've got your relationship model captured here; If not then I vote for unobfuscation:

A [ {AID}, ... ]
B [ {BID}, AID, ... ]
C [ {CID}, ... ]
B_C_Link [ {BID, CID}, AID ]
- Additional Unique Index on (AID, CID)

The notation uses the {} primary key indicator. So As can have multiple Bs (by putting an AID on B), Bs can have Cs (by using a many-to-many table B_C_Link), and multiple Cs cannot belong to the same A (by adding the AID to the many-to-many table and enforcing (AID, CID) uniqueness.

answered Sep 30 '22 17:09

Chris Shaffer

Related questions
                            
                                Implementing Holidays in an Employee Management System
                            
                                Can I iterate over a cursor in pymssql more than once?
                            
                                How to save multiple values to database from Select2 Widget Yii2
                            
                                Is it worth to save keyword <-> link relation into "hastable" like structure in mysql?
                            
                                SQLAlchemy returning strings instead of datetime objects
                            
                                Business data querying/reporting in service oriented architecture
                            
                                Postgres - how to deploy changes from dev database to live version
                            
                                Changing field order dynamically in mysql
                            
                                relationships between 3 entities in ER diagram--is a ternary enough or are 2 binaries also needed?
                            
                                Best practice modeling data for Cassandra databases
                            
                                Group by and group concat , optimization mysql query without using main pk
                            
                                How to configure database.yml to work in a local environment and in Docker both?
                            
                                how to generate SQL from dbplyr without a database connection?
                            
                                Does MySQL support partial indexes?
                            
                                "no such table" error with in-memory sqlite
                            
                                Extracting RDF triples from Wikidata
                            
                                In a project, is there a choice of database systems?
                            
                                How do you set up database testing using the PHP SimpleTest framework
                            
                                What are the major drawbacks to using OpenOffice DB vs. Microsoft Access?
                            
                                Fetching records with slug instead of ID

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Best schema design for table relationship that enforces integrity

Tags:

database

schema-design

meandmycode

People also ask

2 Answers

Jonathan Leffler

Chris Shaffer

Recent Activity

Donate For Us