Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How are super- and subtype relationships in ER diagrams represented as tables?

I am learning how to interpret Entity Relationship Diagrams into SQL DDL statements and I am confused by differences in notation. Consider a disjoint relationship as in the following diagram:

"Vehicle" box connects to "IsA" triangle, noted as "disjoint," which connects separately to "2WD" box and "4WD" box.

Would this be represented as:

  1. Vehicle, 2WD and 4WD tables (2WD and 4WD would point to the PK of Vehicle); or
  2. ONLY the 2WD and 4WD tables (and NO Vehicle table), both of which would duplicate whatever attributes Vehicle would have had?

I think these are other ways of writing the relationship:

"Vehicle" box connects with a thick line to "IsA" triangle, which connects with thin lines separately to "2WD" box and "4WD" box."Vehicle" box connects to "IsA" triangle, which connects separately to "2WD" box and "4WD" box, all by thin lines.

I'm looking for a clear explanation of the difference in regard to what tables you'd end up with for each diagram.

like image 501
xingyu Avatar asked Aug 20 '12 04:08

xingyu


People also ask

What are supertypes and subtypes entities and how are they represented in database?

A supertype is a generic entity type that has a relationship with one or more subtypes. A subtype is a sub-grouping of the entities in an entity type that is meaningful to the organization and that shares common attributes or relationships distinct from other subgroups.

How are relationships represented in an ER diagram?

In ER diagram, relationship type is represented by a diamond and connecting the entities with lines. A set of relationships of same type is known as relationship set.

How can we convert ER diagram to relation or table?

Mapping ProcessCreate table for weak entity set. Add all its attributes to table as field. Add the primary key of identifying entity set. Declare all foreign key constraints.

What is subtype in ER diagram?

Subtype - a subgroup of entities with unique attributes. Inheritance - the concept that subtype entities inherit the values of all supertype attributes.


1 Answers

ER Notation

There are several ER notations. I'm not familiar with the one you are using, but it's clear enough you are trying to represent a subtype (aka. inheritance, category, subclass, generalization hierarchy...). This is the relational cousin of the OOP inheritance.

When doing subtyping, you are generally concerned with the following design decisions:

  • Abstract vs. concrete: Can the parent be instantiated? In your example: can a Vehicle exist without also being 2WD or 4WD?1
  • Inclusive vs. exclusive: Can more than one child be instantiated for the same parent? In your example, can Vehicle be both 2WD and 4WD?2
  • Complete vs. incomplete: Do you expect more children to be added in the future? In your example, do you expect a Bike or a Plane (etc...) could be later added to the database model?

The Information Engineering notation differentiates between inclusive and exclusive subtype relationship. IDEF1X notation, on the other hand, doesn't (directly) recognize this difference, but it does differentiate between complete and incomplete subtype (which IE doesn't).

The following diagram from the ERwin Methods Guide (Chapter 5, Subtype Relationships) illustrates the difference:

enter image description here

Neither IE nor IDEF1X directly allow specifying abstract vs. concrete parent.

Physical Representation

Unfortunately, practical databases don't directly support inheritance, so you'll need to transform this diagram to real tables. There are generally 3 approaches for doing so:

  1. Put all classes in the same table and leave child fields NULL-able. You can then have a CHECK to make sure the right subset of the fields in non-NULL.
    • Pros: No JOINing, so some queries can benefit. Can enforce parent-level keys (e.g. if you want to avoid different 2WD and 4WD vehicles having the same ID). Can easily enforce inclusive vs. exclusive children and abstract vs. concrete parent (by just varying the CHECK).
    • Cons: Some queries can be slower since they must filter-out "uninteresting" children. Depending on your DBMS, child-specific constraints can be problematic. A lot of NULLs can waste storage. Less suitable for incomplete subtyping - adding new child requires altering the existing table, which can be problematic in a production environment.
  2. Put all children in separate tables, but don't have a table for the parent (instead, repeat parent's fields and constraints in all children). Has most of the the characteristics of (3) while avoiding JOINs, at the price of lower maintainability (due to all these field and constraint repetitions) and inability to enforce parent-level keys or represent a concrete parent.
  3. Put both parent and children in separate tables.
    • Pros: Clean. No fields/constraints need to be artificially repeated. Enforces parent-level keys and easy to add child-specific constraints. Suitable for incomplete subtyping (relatively easy to add more child tables). Certain queries can benefit by only looking at "interesting" child table(s).
    • Cons: Some queries can be JOIN-heavy. Can be hard to enforce inclusive vs. exclusive children and abstract vs. concrete parent (these can be enforced declaratively if the DBMS supports circular and deferred foreign keys, but enforcing them at the application level is usually considered a lesser evil).

As you can see, the situation is less than ideal - you'll need to make compromises whatever approach you choose. The approach (3) should probably be your starting point, and only choose one of the alternatives if there is a compelling reason to do so.


1 I'm guessing this is what thickness of the line stands for in your diagrams.

2 I'm guessing this is what presence or absence of "disjoint" stands for in your diagrams.

like image 112
Branko Dimitrijevic Avatar answered Sep 21 '22 07:09

Branko Dimitrijevic