Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Interesting tree/hierarchical data structure problem

Colleges have different ways of organizing their departments. Some schools go School -> Term -> Department. Others have steps in between, with the longest being School -> Sub_Campus -> Program -> Term -> Division -> Department.

School, Term, and Department are the only ones that always exist in a school's "tree" of departments. The order of these categories never changes, with the second example I gave you being the longest. Every step down is a 1:N relationship.

Now, I'm not sure how to set up the relationships between the tables. For example, what columns are in Term? Its parent could be a Program, Sub_Campus, or School. Which one it is depends on the school's system. I could conceive of setting up the Term table to have foreign keys for all of those (which all would default to NULL), but I'm not sure this is the canonical way of doing things here.

like image 612
bgcode Avatar asked Sep 23 '11 21:09

bgcode


4 Answers

I suggest you better use a general table, called e.g. Entity which would contain id field and a self-referencing parent field.

Each relevant table would contain a field pointing to Entity's id (1:1). In a way each table would be a child of the Entity table.

like image 98
MarianP Avatar answered Oct 18 '22 20:10

MarianP


Here's one design possibility:

This option takes advantage of your special constraints. Basically you generalize all hierarchies as that of the longest form by introducing generic nodes. If school doesn't have "sub campus" then just assign it a generic sub campus called "Main". For example, School -> Term -> Department can be thought of same as School -> Sub_Campus = Main -> Program=Main -> Term -> Division=Main -> Department. In this case, we assign a node called "Main" as default when school doesn't have that nodes. Now you can just have a boolean flag property for these generic nodes that indicates that they are just placeholders and this flag would allow you to filter it out in middle layer or in UX if needed.

This design will allow you to take advantage of all relational constraints as usual and simplify handling of missing node types in your code.

like image 30
Shital Shah Avatar answered Oct 18 '22 18:10

Shital Shah


-- Enforcing a taxonomy by self-referential (recursive) tables.
-- Both the classes and the instances have a recursive structure.
-- The taxonomy is enforced mostly based on constraints on the classes,
-- the instances only need to check that {their_class , parents_class} 
-- form a valid pair.
--
DROP schema school CASCADE;
CREATE schema school;

CREATE TABLE school.category
  ( id INTEGER NOT NULL PRIMARY KEY
  , category_name VARCHAR
  );
INSERT INTO school.category(id, category_name) VALUES
  ( 1, 'School' )
  , ( 2, 'Sub_campus' )
  , ( 3, 'Program' )
  , ( 4, 'Term' )
  , ( 5, 'Division' )
  , ( 6, 'Department' )
  ;

-- This table contains a list of all allowable {child->parent} pairs.
-- As a convention, the "roots" of the trees point to themselves.
-- (this also avoids a NULL FK)
CREATE TABLE school.category_valid_parent
  ( category_id INTEGER NOT NULL REFERENCES school.category (id)
  , parent_category_id INTEGER NOT NULL REFERENCES school.category (id)
  );
ALTER TABLE school.category_valid_parent
  ADD PRIMARY KEY (category_id, parent_category_id)
  ;

INSERT INTO school.category_valid_parent(category_id, parent_category_id)
  VALUES
  ( 1,1) -- school -> school
  , (2,1) -- subcampus -> school
  , (3,1) -- program -> school
  , (3,2) -- program -> subcampus
  , (4,1) -- term -> school
  , (4,2) -- term -> subcampus
  , (4,3) -- term -> program
  , (5,4) -- division --> term
  , (6,4) -- department --> term
  , (6,5) -- department --> division
  ;

CREATE TABLE school.instance
  ( id INTEGER NOT NULL PRIMARY KEY
  , category_id INTEGER NOT NULL REFERENCES school.category (id)
  , parent_id INTEGER NOT NULL REFERENCES school.instance (id)
  -- NOTE: parent_category_id is logically redundant
  -- , but needed to maintain the constraint
  -- (without referencing a third table)
  , parent_category_id INTEGER NOT NULL REFERENCES school.category (id)
  , instance_name VARCHAR
  );      -- Forbid illegal combinations of {parent_id, parent_category_id}
ALTER TABLE school.instance ADD CONSTRAINT valid_cat UNIQUE (id,category_id);
ALTER TABLE school.instance
  ADD FOREIGN KEY (parent_id, parent_category_id)
      REFERENCES school.instance(id, category_id);
  ;
  -- Forbid illegal combinations of {category_id, parent_category_id}
ALTER TABLE school.instance
  ADD FOREIGN KEY (category_id, parent_category_id) 
      REFERENCES school.category_valid_parent(category_id, parent_category_id);
  ;

INSERT INTO school.instance(id, category_id
    , parent_id, parent_category_id
    , instance_name) VALUES
  -- Zulo
  (1,1,1,1, 'University of Utrecht' )
  , (2,2,1,1, 'Uithof' )
  , (3,3,2,2, 'Life sciences' )
  , (4,4,3,3, 'Bacherlor' )
  , (5,5,4,4, 'Biology' )
  , (6,6,5,5, 'Evolutionary Biology' )
  , (7,6,5,5, 'Botany' )
  -- Nulo
  , (11,1,11,1, 'Hogeschool Utrecht' )
  , (12,4,11,1, 'Journalistiek' )
  , (13,6,12,4, 'Begrijpend Lezen' )
  , (14,6,12,4, 'Typvaardigheid' )
  ;

  -- try to insert an invalid instance
INSERT INTO school.instance(id, category_id
    , parent_id, parent_category_id
    , instance_name) VALUES
  ( 15, 6, 3,3, 'Procreation' );

WITH RECURSIVE re AS (
  SELECT i0.parent_id AS pa_id
  , i0.parent_category_id AS pa_cat
  , i0.id AS my_id
  , i0.category_id AS my_cat
  FROM school.instance i0
  WHERE i0.parent_id = i0.id
  UNION
  SELECT i1.parent_id AS pa_id
  , i1.parent_category_id AS pa_cat
  , i1.id AS my_id
  , i1.category_id AS my_cat
  FROM school.instance i1
  , re
  WHERE re.my_id = i1.parent_id
  )
SELECT re.*
  , ca.category_name
  , ins.instance_name
  FROM re
  JOIN school.category ca ON (re.my_cat = ca.id)
  JOIN school.instance ins ON (re.my_id = ins.id)
  -- WHERE re.my_id = 14
  ;

The output:

INSERT 0 11
ERROR:  insert or update on table "instance" violates foreign key constraint "instance_category_id_fkey1"
DETAIL:  Key (category_id, parent_category_id)=(6, 3) is not present in table "category_valid_parent".
 pa_id | pa_cat | my_id | my_cat | category_name |     instance_name 
-------+--------+-------+--------+---------------+-----------------------
     1 |      1 |     1 |      1 | School        | University of Utrecht
    11 |      1 |    11 |      1 | School        | Hogeschool Utrecht
     1 |      1 |     2 |      2 | Sub_campus    | Uithof
    11 |      1 |    12 |      4 | Term          | Journalistiek
     2 |      2 |     3 |      3 | Program       | Life sciences
    12 |      4 |    13 |      6 | Department    | Begrijpend Lezen
    12 |      4 |    14 |      6 | Department    | Typvaardigheid
     3 |      3 |     4 |      4 | Term          | Bacherlor
     4 |      4 |     5 |      5 | Division      | Biology
     5 |      5 |     6 |      6 | Department    | Evolutionary Biology
     5 |      5 |     7 |      6 | Department    | Botany
(11 rows)

BTW: I left out the attributes. I propose they could be hooked to the relevant categories by means of a EAV type of data model.

like image 3
wildplasser Avatar answered Oct 18 '22 18:10

wildplasser


I'm going to start by discussing implementing a single hierarchical model (just 1:N relationships) relationally.

Let's use your example School -> Term -> Department.

Here's code that I generated using MySQLWorkbench (I removed a few things to make it clearer):

-- -----------------------------------------------------
-- Table `mydb`.`school`  
-- -----------------------------------------------------
-- each of these tables would have more attributes in a real implementation
-- using varchar(50)'s for PKs because I can -- :)

CREATE  TABLE IF NOT EXISTS `mydb`.`school` (
  `school_name` VARCHAR(50) NOT NULL ,
  PRIMARY KEY (`school_name`) 
);

-- -----------------------------------------------------
-- Table `mydb`.`term`
-- -----------------------------------------------------
CREATE  TABLE IF NOT EXISTS `mydb`.`term` (
  `term_name` VARCHAR(50) NOT NULL ,
  `school_name` VARCHAR(50) NOT NULL ,
  PRIMARY KEY (`term_name`, `school_name`) ,
  FOREIGN KEY (`school_name` )
    REFERENCES `mydb`.`school` (`school_name` )
);

-- -----------------------------------------------------
-- Table `mydb`.`department`
-- -----------------------------------------------------
CREATE  TABLE IF NOT EXISTS `mydb`.`department` (
  `dept_name` VARCHAR(50) NOT NULL ,
  `term_name` VARCHAR(50) NOT NULL ,
  `school_name` VARCHAR(50) NOT NULL ,
  PRIMARY KEY (`dept_name`, `term_name`, `school_name`) ,
  FOREIGN KEY (`term_name` , `school_name` )
    REFERENCES `mydb`.`term` (`term_name` , `school_name` )
);

Here is the MySQLWorkbench version of the data model:
MySQLWorkbench version

As you can see, school, at the top of the hierarchy, has only school_name as its key, whereas department has a three-part key including the keys of all of its parents.

Key points of this solution

  • uses natural keys -- but could be refactored to use surrogate keys (SO question -- along with UNIQUE constraints on multi-column foreign keys)
  • every level of nesting adds one column to the key
  • each table's PK is the entire PK of the table above it, plus an additional column specific to that table

Now for the second part of your question.

My interpretation of the question
There is a hierarchical data model. However, some applications require all of the tables, whereas others utilize only some of the tables, skipping the others. We want to be able to implement 1 single data model and use it for both of these cases.

You could use the solution given above, and, as ShitalShah mentioned, add a default value to any table which would not be used. Let's see some example data, using the model given above, where we only want to save School and Department information (no Terms):

+-------------+
| school_name |
+-------------+
| hogwarts    |
| uCollege    |
| uMatt       |
+-------------+
3 rows in set (0.00 sec)

+-----------+-------------+
| term_name | school_name |
+-----------+-------------+
| default   | hogwarts    |
| default   | uCollege    |
| default   | uMatt       |
+-----------+-------------+
3 rows in set (0.00 sec)

+-------------------------------+-----------+-------------+
| dept_name                     | term_name | school_name |
+-------------------------------+-----------+-------------+
| defense against the dark arts | default   | hogwarts    |
| potions                       | default   | hogwarts    |
| basket-weaving                | default   | uCollege    |
| history of magic              | default   | uMatt       |
| science                       | default   | uMatt       |
+-------------------------------+-----------+-------------+
5 rows in set (0.00 sec)

Key points

  • there is a default value in term for every value in school -- this could be quite annoying if you had a table deep in the hierarchy that an application didn't need
  • since the table schema doesn't change, the same queries can be used
  • queries are easy to write and portable
  • SO seems to think default should be colored differently

There is another solution to storing trees in databases. Bill Karwin discusses it here, starting around slide 49, but I don't think this is the solution you want. Karwin's solution is for trees of any size, whereas your examples seem to be relatively static. Also, his solutions come with their own set of problems (but doesn't everything?).


I hope that helps with your question.

like image 1
Matt Fenwick Avatar answered Oct 18 '22 19:10

Matt Fenwick