I am learning how to interpret Entity Relationship Diagrams into SQL DDL statements and I am confused by differences in notation. Consider a disjoint relationship as in the following diagram: <img src="https://i.stack.imgur.com/poHyH.jpg" alt='"Vehicle" box connects to "IsA" triangle, noted as "disjoint," which connects separately to "2WD" box and "4WD" box.'> Would this be represented as: <ol> <li>Vehicle, 2WD and 4WD tables (2WD and 4WD would point to the PK of Vehicle); or </li> <li>ONLY the 2WD and 4WD tables (and NO Vehicle table), both of which would duplicate whatever attributes Vehicle would have had?</li> </ol> I think these are other ways of writing the relationship: <img src="https://i.stack.imgur.com/8MBVk.jpg" alt='"Vehicle" box connects with a thick line to "IsA" triangle, which connects with thin lines separately to "2WD" box and "4WD" box.'><img src="https://i.stack.imgur.com/drhvL.jpg" alt='"Vehicle" box connects to "IsA" triangle, which connects separately to "2WD" box and "4WD" box, all by thin lines.'> I'm looking for a clear explanation of the difference in regard to what tables you'd end up with for each diagram.

<h3>ER Notation</h3> There are several ER notations. I'm not familiar with the one you are using, but it's clear enough you are trying to represent a subtype (aka. inheritance, category, subclass, generalization hierarchy...). This is the relational cousin of the OOP inheritance. When doing subtyping, you are generally concerned with the following design decisions: <ul> <li> Abstract vs. concrete: Can the parent be instantiated? In your example: can a <code>Vehicle</code> exist without also being <code>2WD</code> or <code>4WD</code>?1 </li> <li> Inclusive vs. exclusive: Can more than one child be instantiated for the same parent? In your example, can <code>Vehicle</code> be both <code>2WD</code> and <code>4WD</code>?2 </li> <li> Complete vs. incomplete: Do you expect more children to be added in the future? In your example, do you expect a <code>Bike</code> or a <code>Plane</code> (etc...) could be later added to the database model?</li> </ul> The Information Engineering notation differentiates between inclusive and exclusive subtype relationship. IDEF1X notation, on the other hand, doesn't (directly) recognize this difference, but it does differentiate between complete and incomplete subtype (which IE doesn't). The following diagram from the ERwin Methods Guide (Chapter 5, Subtype Relationships) illustrates the difference: <img src="https://i.stack.imgur.com/1ql3e.png" alt="enter image description here"> Neither IE nor IDEF1X directly allow specifying abstract vs. concrete parent. <h3>Physical Representation</h3> Unfortunately, practical databases don't directly support inheritance, so you'll need to transform this diagram to real tables. There are generally 3 approaches for doing so: <ol> <li>Put all classes in the same table and leave child fields NULL-able. You can then have a CHECK to make sure the right subset of the fields in non-NULL. <ul> <li>Pros: No JOINing, so some queries can benefit. Can enforce parent-level keys (e.g. if you want to avoid different <code>2WD</code> and <code>4WD</code> vehicles having the same ID). Can easily enforce inclusive vs. exclusive children and abstract vs. concrete parent (by just varying the CHECK).</li> <li>Cons: Some queries can be slower since they must filter-out "uninteresting" children. Depending on your DBMS, child-specific constraints can be problematic. A lot of NULLs can waste storage. Less suitable for incomplete subtyping - adding new child requires altering the existing table, which can be problematic in a production environment.</li> </ul> </li> <li>Put all children in separate tables, but don't have a table for the parent (instead, repeat parent's fields and constraints in all children). Has most of the the characteristics of (3) while avoiding JOINs, at the price of lower maintainability (due to all these field and constraint repetitions) and inability to enforce parent-level keys or represent a concrete parent.</li> <li>Put both parent and children in separate tables. <ul> <li>Pros: Clean. No fields/constraints need to be artificially repeated. Enforces parent-level keys and easy to add child-specific constraints. Suitable for incomplete subtyping (relatively easy to add more child tables). Certain queries can benefit by only looking at "interesting" child table(s).</li> <li>Cons: Some queries can be JOIN-heavy. Can be hard to enforce inclusive vs. exclusive children and abstract vs. concrete parent (these can be enforced declaratively if the DBMS supports circular and deferred foreign keys, but enforcing them at the application level is usually considered a lesser evil).</li> </ul> </li> </ol> As you can see, the situation is less than ideal - you'll need to make compromises whatever approach you choose. The approach (3) should probably be your starting point, and only choose one of the alternatives if there is a compelling reason to do so. <hr> 1 I'm guessing this is what thickness of the line stands for in your diagrams. 2 I'm guessing this is what presence or absence of "disjoint" stands for in your diagrams.

How are super- and subtype relationships in ER diagrams represented as tables?

Tags:

database

database-design

entity-relationship

erd

I am learning how to interpret Entity Relationship Diagrams into SQL DDL statements and I am confused by differences in notation. Consider a disjoint relationship as in the following diagram:

"Vehicle" box connects to "IsA" triangle, noted as "disjoint," which connects separately to "2WD" box and "4WD" box.

Would this be represented as:

Vehicle, 2WD and 4WD tables (2WD and 4WD would point to the PK of Vehicle); or
ONLY the 2WD and 4WD tables (and NO Vehicle table), both of which would duplicate whatever attributes Vehicle would have had?

I think these are other ways of writing the relationship:

"Vehicle" box connects with a thick line to "IsA" triangle, which connects with thin lines separately to "2WD" box and "4WD" box. "Vehicle" box connects to "IsA" triangle, which connects separately to "2WD" box and "4WD" box, all by thin lines.

I'm looking for a clear explanation of the difference in regard to what tables you'd end up with for each diagram.

501

asked Aug 20 '12 04:08

xingyu

1 Answers

ER Notation

There are several ER notations. I'm not familiar with the one you are using, but it's clear enough you are trying to represent a subtype (aka. inheritance, category, subclass, generalization hierarchy...). This is the relational cousin of the OOP inheritance.

When doing subtyping, you are generally concerned with the following design decisions:

Abstract vs. concrete: Can the parent be instantiated? In your example: can a Vehicle exist without also being 2WD or 4WD?¹
Inclusive vs. exclusive: Can more than one child be instantiated for the same parent? In your example, can Vehicle be both 2WD and 4WD?²
Complete vs. incomplete: Do you expect more children to be added in the future? In your example, do you expect a Bike or a Plane (etc...) could be later added to the database model?

The Information Engineering notation differentiates between inclusive and exclusive subtype relationship. IDEF1X notation, on the other hand, doesn't (directly) recognize this difference, but it does differentiate between complete and incomplete subtype (which IE doesn't).

The following diagram from the ERwin Methods Guide (Chapter 5, Subtype Relationships) illustrates the difference:

enter image description here

Neither IE nor IDEF1X directly allow specifying abstract vs. concrete parent.

Physical Representation

Unfortunately, practical databases don't directly support inheritance, so you'll need to transform this diagram to real tables. There are generally 3 approaches for doing so:

Put all classes in the same table and leave child fields NULL-able. You can then have a CHECK to make sure the right subset of the fields in non-NULL.
- Pros: No JOINing, so some queries can benefit. Can enforce parent-level keys (e.g. if you want to avoid different 2WD and 4WD vehicles having the same ID). Can easily enforce inclusive vs. exclusive children and abstract vs. concrete parent (by just varying the CHECK).
- Cons: Some queries can be slower since they must filter-out "uninteresting" children. Depending on your DBMS, child-specific constraints can be problematic. A lot of NULLs can waste storage. Less suitable for incomplete subtyping - adding new child requires altering the existing table, which can be problematic in a production environment.
Put all children in separate tables, but don't have a table for the parent (instead, repeat parent's fields and constraints in all children). Has most of the the characteristics of (3) while avoiding JOINs, at the price of lower maintainability (due to all these field and constraint repetitions) and inability to enforce parent-level keys or represent a concrete parent.
Put both parent and children in separate tables.
- Pros: Clean. No fields/constraints need to be artificially repeated. Enforces parent-level keys and easy to add child-specific constraints. Suitable for incomplete subtyping (relatively easy to add more child tables). Certain queries can benefit by only looking at "interesting" child table(s).
- Cons: Some queries can be JOIN-heavy. Can be hard to enforce inclusive vs. exclusive children and abstract vs. concrete parent (these can be enforced declaratively if the DBMS supports circular and deferred foreign keys, but enforcing them at the application level is usually considered a lesser evil).

As you can see, the situation is less than ideal - you'll need to make compromises whatever approach you choose. The approach (3) should probably be your starting point, and only choose one of the alternatives if there is a compelling reason to do so.

¹ I'm guessing this is what thickness of the line stands for in your diagrams.

² I'm guessing this is what presence or absence of "disjoint" stands for in your diagrams.

112

answered Sep 21 '22 07:09

Branko Dimitrijevic

Related questions
                            
                                Chaining JSON_EXTRACT with CAST or STR_TO_DATE fails
                            
                                SSMS crashes when try to modify database diagram (v18.2)
                            
                                Is there any reason for numeric rather than int in T-SQL?
                            
                                Postgres: Find position of a specific row within a resultset?
                            
                                Ruby - Test whether database connection is possible [closed]
                            
                                Why and how do Databases use a single file to store all data? [closed]
                            
                                MySQL database optimization best practices
                            
                                Empty table data and reset IDENTITY columns
                            
                                Best way to handle Datarow DBNull [duplicate]
                            
                                Questionable SQL practice - Order By id rather than creation time
                            
                                difference between master and transaction table
                            
                                Find leaf nodes in hierarchical tree
                            
                                How would you design your database to allow user-defined schema
                            
                                Which NoSQL DB is best fitted for OLTP financial systems?
                            
                                Does a SELECT query always return rows in the same order? Table with clustered index
                            
                                Rails 3, ActiveRecord, PostgreSQL - ".uniq" command doesn't work?
                            
                                Generating migration from existing database in Yii or Laravel
                            
                                How can i change the name of databases in redis?
                            
                                Pro's of databases like BigTable, SimpleDB
                            
                                Google App Engine database viewer/browser?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With