Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Storing & Querying Heirarchical Data with Multiple Parent Nodes

I've been doing quite a bit of searching, but haven't been able to find many resources on the topic. My goal is to store scheduling data like you would find in a Gantt chart. So one example of storing the data might be:

Task Id | Name    | Duration
1         Task A    1
2         Task B    3
3         Task C    2

Task Id | Predecessors
1         Null
2         Null
3         1
3         2

Which would have Task C waiting for both Task A and Task B to complete.

So my question is: What is the best way to store this kind of data and efficiently query it? Any good resources for this kind of thing? There is a ton of information about tree structures, but once you add in multiple parents it becomes hard to find info. By the way, I'm working with SQL Server and .NET for this task.

like image 494
Ocelot20 Avatar asked Aug 27 '10 12:08

Ocelot20


People also ask

What do you mean storing?

to put or keep things in a special place for use in the future: The data is stored on a hard disk and backed up on a CD.

What is another word for storing something?

(or squirrelling (away)), stashing, stockpiling, stowing, treasuring.

Why is storing necessary?

Storage is an important marketing function, which involves holding and preserving goods from the time they are produced until they are needed for consumption. The storage of goods, therefore, from the time of production to the time of consumption, ensures a continuous flow of goods in the market.

What is storing means Brainly?

Storing means to collect and keep something for a long time and protect it from microorganisms. Ex- farmers store the grains in granaries and silos.


3 Answers

Your problem is related to the concept of relationship cardinality. All relationships have some cardinality, which expresses the potential number of instances on each side of the relationship that are members of it, or can participate in a single instance of the relationship. As an example, for people, (for most living things, I guess, with rare exceptions), the Parent-Child relationship has a cardinality of 2 to zero or many, meaning it takes two parents on the parent side, and there can be zero or many children (perhaps it should be 2 to 1 or many)

In database design, generally, anything that has a 1(one), (or a zero or one), on one side can be easily represented with just two tables, one for each entity, (sometimes only one table is needed see note**) and a foreign key column in the table representing the "many" side, that points to the other table holding the entity on the "one" side.

In your case you have a many to many relationship. (A Task can have multiple predecessors, and each predecessors can certainly be the predecessor for multiple tasks) In this case a third table is needed, where each row, effectively, represents an association between 2 tasks, representing that one is the predecessor to the other. Generally, This table is designed to contain only all the columns of the primary keys of the two parent tables, and it's own primary key is a composite of all the columns in both parent Primary keys. In your case it simply has two columns, the taskId, and the PredecessorTaskId, and this pair of Ids should be unique within the table so together they form the composite PK.

When querying, to avoid double counting data columns in the parent tables when there are multiple joins, simply base the query on the parent table... e.g., to find the duration of the longest parent, Assuming your association table is named TaskPredecessor

  Select TaskId, Max(P.Duration)
  From Task T Join Task P
     On P.TaskId In (Select PredecessorId 
                     From TaskPredecessor
                     Where TaskId = T.TaskId)

** NOTE. In cases where both entities in the relationship are of the same entity type, they can both be in the same table. The canonical (luv that word) example is an employee table with the many to one relationship of Worker to Supervisor... Since the supervisor is also an employee, both workers and supervisors can be in the same [Employee] table, and the realtionship can gbe modeled with a Foreign Key (called say SupervisorId) that points to another row in the same table and contains the Id of the employee record for that employee's supervisor.

like image 64
Charles Bretana Avatar answered Sep 19 '22 02:09

Charles Bretana


Use adjacency list model:

chain

task_id  predecessor
3        1
3        2

and this query to find all predecessors of the given task:

WITH    q AS
        (
        SELECT  predecessor
        FROM    chain
        WHERE   task_id = 3
        UNION ALL
        SELECT  c.predecessor
        FROM    q
        JOIN    chain c
        ON      c.task_id = q.predecessor
        )
SELECT  *
FROM    q

To get the duration of the longest parent for each task:

WITH    q AS
        (
        SELECT  task_id, duration
        FROM    tasks
        UNION ALL
        SELECT  t.task_id, t.duration
        FROM    q
        JOIN    chain с
        ON      c.task_id = q.task_id
        JOIN    tasks t
        ON      t.task_id = c.predecessor
        )
SELECT  task_id, MAX(duration)
FROM    q
like image 33
Quassnoi Avatar answered Sep 21 '22 02:09

Quassnoi


Check "Hierarchical Weighted total" pattern in "SQL design patterns" book, or "Bill Of Materials" section in "Trees and Hierarchies in SQL".

In a word, graphs feature double aggregation. You do one kind of aggregation along the nodes in each path, and another one across alternative paths. For example, find a minimal distance between the two nodes is minimum over summation. Hierarchical weighted total query (aka Bill Of Materials) is multiplication of the quantities along each path, and summation along each alternative path:

     
       with TCAssembly as (
          select Part, SubPart, Quantity AS factoredQuantity
          from AssemblyEdges
          where Part = ‘Bicycle’
          union all
          select te.Part, e.SubPart, e.Quantity * te.factoredQuantity
          from TCAssembly te, AssemblyEdges e
          where te.SubPart = e.Part
       ) select SubPart, sum(Quantity) from TCAssembly
       group by SubPart        
like image 38
Tegiri Nenashi Avatar answered Sep 19 '22 02:09

Tegiri Nenashi