I've been doing quite a bit of searching, but haven't been able to find many resources on the topic. My goal is to store scheduling data like you would find in a Gantt chart. So one example of storing the data might be:
Task Id | Name | Duration
1 Task A 1
2 Task B 3
3 Task C 2
Task Id | Predecessors
1 Null
2 Null
3 1
3 2
Which would have Task C waiting for both Task A and Task B to complete.
So my question is: What is the best way to store this kind of data and efficiently query it? Any good resources for this kind of thing? There is a ton of information about tree structures, but once you add in multiple parents it becomes hard to find info. By the way, I'm working with SQL Server and .NET for this task.
to put or keep things in a special place for use in the future: The data is stored on a hard disk and backed up on a CD.
(or squirrelling (away)), stashing, stockpiling, stowing, treasuring.
Storage is an important marketing function, which involves holding and preserving goods from the time they are produced until they are needed for consumption. The storage of goods, therefore, from the time of production to the time of consumption, ensures a continuous flow of goods in the market.
Storing means to collect and keep something for a long time and protect it from microorganisms. Ex- farmers store the grains in granaries and silos.
Your problem is related to the concept of relationship cardinality. All relationships have some cardinality, which expresses the potential number of instances on each side of the relationship that are members of it, or can participate in a single instance of the relationship. As an example, for people, (for most living things, I guess, with rare exceptions), the Parent-Child relationship has a cardinality of 2 to zero or many
, meaning it takes two parents on the parent side, and there can be zero or many children (perhaps it should be 2 to 1 or many
)
In database design, generally, anything that has a 1(one), (or a zero or one), on one side can be easily represented with just two tables, one for each entity, (sometimes only one table is needed see note**) and a foreign key column in the table representing the "many" side, that points to the other table holding the entity on the "one" side.
In your case you have a many to many
relationship. (A Task can have multiple predecessors, and each predecessors can certainly be the predecessor for multiple tasks) In this case a third table is needed, where each row, effectively, represents an association between 2 tasks, representing that one is the predecessor to the other. Generally, This table is designed to contain only all the columns of the primary keys of the two parent tables, and it's own primary key is a composite of all the columns in both parent Primary keys. In your case it simply has two columns, the taskId, and the PredecessorTaskId, and this pair of Ids should be unique within the table so together they form the composite PK.
When querying, to avoid double counting data columns in the parent tables when there are multiple joins, simply base the query on the parent table... e.g., to find the duration of the longest parent, Assuming your association table is named TaskPredecessor
Select TaskId, Max(P.Duration)
From Task T Join Task P
On P.TaskId In (Select PredecessorId
From TaskPredecessor
Where TaskId = T.TaskId)
** NOTE. In cases where both entities in the relationship are of the same entity type, they can both be in the same table. The canonical (luv that word) example is an employee table with the many to one relationship of Worker to Supervisor... Since the supervisor is also an employee, both workers and supervisors can be in the same [Employee] table, and the realtionship can gbe modeled with a Foreign Key (called say SupervisorId) that points to another row in the same table and contains the Id of the employee record for that employee's supervisor.
Use adjacency list model:
chain
task_id predecessor
3 1
3 2
and this query to find all predecessors of the given task:
WITH q AS
(
SELECT predecessor
FROM chain
WHERE task_id = 3
UNION ALL
SELECT c.predecessor
FROM q
JOIN chain c
ON c.task_id = q.predecessor
)
SELECT *
FROM q
To get the duration of the longest parent for each task:
WITH q AS
(
SELECT task_id, duration
FROM tasks
UNION ALL
SELECT t.task_id, t.duration
FROM q
JOIN chain с
ON c.task_id = q.task_id
JOIN tasks t
ON t.task_id = c.predecessor
)
SELECT task_id, MAX(duration)
FROM q
Check "Hierarchical Weighted total" pattern in "SQL design patterns" book, or "Bill Of Materials" section in "Trees and Hierarchies in SQL".
In a word, graphs feature double aggregation. You do one kind of aggregation along the nodes in each path, and another one across alternative paths. For example, find a minimal distance between the two nodes is minimum over summation. Hierarchical weighted total query (aka Bill Of Materials) is multiplication of the quantities along each path, and summation along each alternative path:
with TCAssembly as ( select Part, SubPart, Quantity AS factoredQuantity from AssemblyEdges where Part = ‘Bicycle’ union all select te.Part, e.SubPart, e.Quantity * te.factoredQuantity from TCAssembly te, AssemblyEdges e where te.SubPart = e.Part ) select SubPart, sum(Quantity) from TCAssembly group by SubPart
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With