Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you effectively model inheritance in a database?

People also ask

What is modeling inheritance?

Models inheritance works the same way as normal Python class inheritance works, the only difference is, whether we want the parent models to have their own table in the database or not. When the parent model tables are not created as tables it just acts as a container for common fields and methods.

What data model uses inheritance?

Inheritance is a common modeling technique used in modern software development. In data modeling, you can use inheritance in the logical model creation process.

What is inheritance in DBMS with example?

Inheritance is an important feature of Generalization and Specialization. It allows lower-level entities to inherit the attributes of higher-level entities. For example, the attributes of a Person class such as name, age, and gender can be inherited by lower-level entities such as Student or Teacher.

Which is a way to emulate object-oriented inheritance in a relational database?

Single table inheritance is a way to emulate object-oriented inheritance in a relational database.


There are several ways to model inheritance in a database. Which you choose depends on your needs. Here are a few options:

Table-Per-Type (TPT)

Each class has its own table. The base class has all the base class elements in it, and each class which derives from it has its own table, with a primary key which is also a foreign key to the base class table; the derived table's class contains only the different elements.

So for example:

class Person {
    public int ID;
    public string FirstName;
    public string LastName;
}

class Employee : Person {
    public DateTime StartDate;
}

Would result in tables like:

table Person
------------
int id (PK)
string firstname
string lastname

table Employee
--------------
int id (PK, FK)
datetime startdate

Table-Per-Hierarchy (TPH)

There is a single table which represents all the inheritance hierarchy, which means several of the columns will probably be sparse. A discriminator column is added which tells the system what type of row this is.

Given the classes above, you end up with this table:

table Person
------------
int id (PK)
int rowtype (0 = "Person", 1 = "Employee")
string firstname
string lastname
datetime startdate

For any rows which are rowtype 0 (Person), the startdate will always be null.

Table-Per-Concrete (TPC)

Each class has its own fully formed table with no references off to any other tables.

Given the classes above, you end up with these tables:

table Person
------------
int id (PK)
string firstname
string lastname

table Employee
--------------
int id (PK)
string firstname
string lastname
datetime startdate

Proper database design is nothing like proper object design.

If you are planning to use the database for anything other than simply serializing your objects (such as reports, querying, multi-application use, business intelligence, etc.) then I do not recommend any kind of a simple mapping from objects to tables.

Many people think of a row in a database table as an entity (I spent many years thinking in those terms), but a row is not an entity. It is a proposition. A database relation (i.e., table) represents some statement of fact about the world. The presence of the row indicates the fact is true (and conversely, its absence indicates the fact is false).

With this understanding, you can see that a single type in an object-oriented program may be stored across a dozen different relations. And a variety of types (united by inheritance, association, aggregation, or completely unaffiliated) may be partially stored in a single relation.

It is best to ask yourself, what facts do you want to store, what questions are you going to want answers to, what reports do you want to generate.

Once the proper DB design is created, then it is a simple matter to create queries/views that allow you to serialize your objects to those relations.

Example:

In a hotel booking system, you may need to store the fact that Jane Doe has a reservation for a room at the Seaview Inn for April 10-12. Is that an attribute of the customer entity? Is it an attribute of the hotel entity? Is it a reservation entity with properties that include customer and hotel? It could be any or all of those things in an object oriented system. In a database, it is none of those things. It is simply a bare fact.

To see the difference, consider the following two queries. (1) How many hotel reservations does Jane Doe have for next year? (2) How many rooms are booked for April 10 at the Seaview Inn?

In an object-oriented system, query (1) is an attribute of the customer entity, and query (2) is an attribute of the hotel entity. Those are the objects that would expose those properties in their APIs. (Though, obviously the internal mechanisms by which those values are obtained may involve references to other objects.)

In a relational database system, both queries would examine the reservation relation to get their numbers, and conceptually there is no need to bother with any other "entity".

Thus, it is by attempting to store facts about the world—rather than attempting to store entities with attributes—that a proper relational database is constructed. And once it is properly designed, then useful queries that were undreamt of during the design phase can be easily constructed, since all the facts needed to fulfill those queries are in their proper places.


Short answer: you don't.

If you need to serialize your objects, use an ORM, or even better something like activerecord or prevaylence.

If you need to store data, store it in a relational manner (being careful about what you are storing, and paying attention to what Jeffrey L Whitledge just said), not one affected by your object design.


TPT, TPH and TPC patterns are the ways you go, as mentioned by Brad Wilson. But couple of notes:

  • child classes inheriting from a base class can be seen as weak-entities to the base class definition in the database, meaning they are dependent to their base-class and cannot exist without it. I've seen number of times, that unique IDs are stored for each and every child table while also keeping the FK to the parent table. One FK is just enough and its even better to have on-delete cascade enable for the FK-relation between the child and base tables.

  • In TPT, by only seeing the base table records, you're not able to find which child class the record is representing. This is sometimes needed, when you want to load a list of all records (without doing select on each and every child table). One way to handle this, is to have one column representing the type of the child class (similar to the rowType field in the TPH), so mixing the TPT and TPH somehow.

Say we want to design a database that holds the following shape class diagram:

public class Shape {
int id;
Color color;
Thickness thickness;
//other fields
}

public class Rectangle : Shape {
Point topLeft;
Point bottomRight;
}

public class Circle : Shape {
Point center;
int radius;
}

The database design for the above classes can be like this:

table Shape
-----------
int id; (PK)
int color;
int thichkness;
int rowType; (0 = Rectangle, 1 = Circle, 2 = ...)

table Rectangle
----------
int ShapeID; (FK on delete cascade)
int topLeftX;
int topLeftY;
int bottomRightX;
int bottomRightY;

table Circle
----------
int ShapeID; (FK on delete cascade)  
int centerX;
int center;
int radius;