Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How To Develop A Database Schema For A Tree Structure(Directed acyclic graph)

I'm using below tree structure and planning to develop a db schema for the below.

enter image description here

What I have development up to now is below,

enter image description here

The problem I'm having is if I search for Y, below tree should be generated.

enter image description here

Logic I'm using is, Y has two cross references X,Z, those two nodes should be in the diagram and the parent nodes all the way to the starting parent node.

Given that I'm using PHP to generate this tree using a mysql db table as shown above. DB structure can be changed. I searched on google for a similar tree structure but I couldn't find any help.

Note

I'm not asking for you to write code for me. All I'm asking is some guidelines how this should be done.

I found below helpful but still different to my scenario

What is the most efficient/elegant way to parse a flat table into a tree?

How to represent a tree like structure in a db

If anyone can please tell me what php libraries should I use to generate the tree and what's the suitable db structure to be used?

like image 643
Techie Avatar asked Aug 18 '13 13:08

Techie


People also ask

Can a directed acyclic graph be a tree?

A polytree (or directed tree or oriented tree or singly connected network) is a directed acyclic graph (DAG) whose underlying undirected graph is a tree. In other words, if we replace its directed edges with undirected edges, we obtain an undirected graph that is both connected and acyclic.

Can a DAG be a tree?

A Tree is just a restricted form of a Graph. Trees have direction (parent / child relationships) and don't contain cycles. They fit with in the category of Directed Acyclic Graphs (or a DAG). So Trees are DAGs with the restriction that a child can only have one parent.

What data structure is used in DAG?

A directed acyclic graph (DAG) is a conceptual representation of a series of activities. The order of the activities is depicted by a graph, which is visually presented as a set of circles, each one representing an activity, some of which are connected by lines, which represent the flow from one activity to another.


3 Answers

Your db structure doesn't seem to be a tree,it's just graph.

I suggest you to throw out relational database for this structure and take look at some of Graph Database like Neo4j ,OrientDB and Infinite Graph.

But if your forced to use MySQL you can use FlockDB which can be used to traverse MySQL Node(see row as node) but with some limitation. or you can test another MySQL engine like OQGRAPH which provide graph engine for MySQL and MariaDB.

like image 71
Moein Hosseini Avatar answered Sep 28 '22 13:09

Moein Hosseini


Let me restate the question just to make sure I understand it right. You have Nodes and two kind of relations - arrows and vertical lines. Then given a Node N, you want to generate a subset S(N) of Nodes with the following recursive rules:

  • Rule 0: N is in S(N).
  • Rule 1: If Node M is connected with N via vertical relation, it is also in S(N).
  • Rule 2: If Node M is in S(N), then any Node with arrow pointing to M is also in S(N).

The set S(N) is the minimal set of Nodes satisfying those rules.

The example your give with N being Y, seem to confirm these rules.

However, there is another different but (at least to me) more natural set of Rules, where Rule 1 above is replaced by

  • Rule 1': If Node M is in S(N), then any Node connected with M via vertical relation is also in S(N).

In the sequel I will assume Rules 0,1,2 confirmed by your example, but similar approach can be used for Rules 0,1',2 or any modification.


Also I understand there is a mistake in your table in row 7, it should be:

7    B2    B    B1,B3 (not B2,B3)

Now to the proposed solution.

I would first slightly modify your table structure: Since "id" is your primary key, the rule for a foreign key is to point to the primary key of the related entry. That is, in your case, I would replace "node_id" by "node_name" or something like that just not to confuse with "id", and replace entries of "node_parent_id" and "cross_ref" by their "id"s. For instance, the row number 15 would look like that:

15    'Y'    [13]    [14,16]

Alternatively, if you prefer for readability reasons, you can use A, B, X, Y etc. as primary keys, provided they are unique, of course, then your table would remain the same, with exception of the "id" field that is not needed. I will assume the first case, but you can adapt it to the second with simple replacement.

That is all you need, as far as the table goes.


Now you need a recursive function to generate the subgraph S(N) for each given Node N.

I will implement the set S as array $set of all 'id's of its Nodes. Then I will define two functions - one to expand the set initially based on Rules 1,2 and the other to expand the set subsequently based on Rule 2 only.

For simplicity, I will assume your table is imported into memory as associative array $rows of rows, such that $rows[$id] represents the row with 'id' equal to $id as, again, associative array. So

$rows[15] = array('id'=>15, 
                  'node_name'=>'Y', 
                  'node_parent_id'=>array(13), 
                  'cross_ref'=>array(14,16)
);

Here is the code for the functions:

function initial_expand_set ($node_id) {
    global($rows); // to access the array from outside
    $set = array($node_id);    // Rule 0
    $row = $rows[$node_id];    // Get current Node

    $parents = $row['node_parent_id'];  // Get parents of the Node
    $set = $parents ? array_merge ($set, $parents) : $set;   // Join parents to the set

    $vert_brothers = $row['cross_ref'];  // Get vertical relations
    $set = $vert_brothers ? array_merge ($set, $vert_brothers) : $set;

    $set = expand_set($set);  // Recursive function defined below
    return $set;
}

And the recursive function:

// iterate over nodes inside the set, expand each node, and merge all together
function expand_set (array $set) {
    global($rows);
    $expansion = $set;  //  Initially $expansion is the same as $set
    foreach ($set as $node_id) {
        $row = $rows[$node_id];    // Get Node 

        // Get the set of parents of the Node - this is our new set
        $parents = $row['node_parent_id'];  // Get parents of the Node

        // apply the recursion
        $parents_expanded = expand_set($parents);

        // merge with previous expansion
        $expansion = array_merge($expansion, $parents_expanded);
    }
    // after the loop is finished getting rid of redundant entries
    // array_unique generates associative array, for which I only collect values
    $expansion = array_values( array_unique($expansion) );
    return $expansion;
}

Hope it works for you. :)

If any further details or explanations needed, I will be happy to assist.

PS. For the pedants among readers note that I use space before '(' for function definition and no space for function calls, as recommended by Douglas Crokford.

like image 29
Dmitri Zaitsev Avatar answered Sep 28 '22 12:09

Dmitri Zaitsev


Your database structure is not normalised, because you have multiple ids in both node_parent_id and cross_refer. You should separate this information out into separate tables.

So, you would have your nodes table, plus a second table to describe the parent-child relationships; this table would have a child node id and a parent node id.

The cross-references should be in a third table, which again has two node id columns, but there are two ways you can do this, because the cross-references are bi-directional. One way is to store each cross-reference only once, which means when you query the table, you have to check both possibilities (a cross reference between X and Y could be stored with X in the first column and Y in the second column, or the other way around, so to find X you would have to check both columns). The other way is to store each cross-reference twice, once for each direction. This makes querying simpler but it is storing redundant data and can lead to inconsistencies, e.g. if for some reason one reference is deleted but the other is not.

Finding paths with this structure becomes a much simpler matter because you don't have to additionally parse comma-separated strings, which is both more complex and less efficient.

You can also use this to ensure referential integrity, so that, for example, a node doesn't have a parent id that doesn't actually exist in the database.

For more information, research "database normalisation". (Or you can spell it with a "z" if you are so inclined ;-P)

like image 29
leftclickben Avatar answered Sep 28 '22 11:09

leftclickben