Let's say I have an acyclic directed graph such as a family "tree" (not really a tree since a child has 2 parents). I want to place a representation of this graph in a relational database so that it's fast to compute all ancestors of a node, and all descendants of a node. How would you represent this graph? How would you query for all descendants? How would you insert and remove nodes and relationships? What assumptions are you making about the data?
The best solution will have the best big O for the number of select/insert/delete statements you run to query ancestors and descendants, with ties broken by best big O for total runtime, with ties broken by space requirements.
My coworker posed this question to me. I have a solution but it's exponential size in the worst case so I wanted to see how other people would solve it.
edit
Clarified relational database. This question is trivial (and boring) if you use graph databases with built in transitive closures.
A database availability group (DAG) is a high availability (HA) and data recovery feature of Exchange Server 2010. A database availability group, which can consist of up to 16 Exchange mailbox servers, automates recovery at the database-level after a database, server or network failure.
A directed acyclic graph (or DAG) is a digraph that has no cycles. Example of a DAG: Theorem Every finite DAG has at least one source, and at least one sink. In fact, given any vertex v, there is a path from some source to v, and a path from v to some sink.
A directed acyclic graph (DAG) is a conceptual representation of a series of activities. The order of the activities is depicted by a graph, which is visually presented as a set of circles, each one representing an activity, some of which are connected by lines, which represent the flow from one activity to another.
A directed graph is acyclic if it contains no cycles. That is, starting at any node in the graph, no sequence of edges exists that can be followed to loop back to that starting node.
If selects > manipulations, and especially subtree selects (all ancestors, all descendants) I'd go for a Closure-table approach. Yes, an explosion of paths in your path-table, but it does deliver results fast (as opposed to the adjacency model), and keeps updates limited to relevant portions (as opposed to 50% update with nested sets).
Bill Karwin has some nice presentation online about pros and cons of different models, see http://www.slideshare.net/billkarwin/models-for-hierarchical-data (slide 48 is an overview).
For DAGs in SQL databases there appeared to be only two solutions:
Recursive WITH clause.
Transitive closure
I'm not aware of any practical graph labeling scheme (like nested sets,intervals or materialized path)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With