Let's say I have an acyclic directed graph such as a family "tree" (not really a tree since a child has 2 parents). I want to place a representation of this graph in a relational database so that it's fast to compute all ancestors of a node, and all descendants of a node. How would you represent this graph? How would you query for all descendants? How would you insert and remove nodes and relationships? What assumptions are you making about the data? The best solution will have the best big O for the number of <code>select/insert/delete</code> statements you run to query ancestors and descendants, with ties broken by best big O for total runtime, with ties broken by space requirements. My coworker posed this question to me. I have a solution but it's exponential size in the worst case so I wanted to see how other people would solve it. edit Clarified relational database. This question is trivial (and boring) if you use graph databases with built in transitive closures.

For DAGs in SQL databases there appeared to be only two solutions: <ol> <li>Recursive WITH clause.</li> <li>Transitive closure </li> </ol> I'm not aware of any practical graph labeling scheme (like nested sets,intervals or materialized path)

Efficient database query for ancestors on an acyclic directed graph

Tags:

algorithm

database

graph

family-tree

Let's say I have an acyclic directed graph such as a family "tree" (not really a tree since a child has 2 parents). I want to place a representation of this graph in a relational database so that it's fast to compute all ancestors of a node, and all descendants of a node. How would you represent this graph? How would you query for all descendants? How would you insert and remove nodes and relationships? What assumptions are you making about the data?

The best solution will have the best big O for the number of select/insert/delete statements you run to query ancestors and descendants, with ties broken by best big O for total runtime, with ties broken by space requirements.

My coworker posed this question to me. I have a solution but it's exponential size in the worst case so I wanted to see how other people would solve it.

edit

Clarified relational database. This question is trivial (and boring) if you use graph databases with built in transitive closures.

336

asked Sep 20 '10 20:09

Dave Aaron Smith

2 Answers

If selects > manipulations, and especially subtree selects (all ancestors, all descendants) I'd go for a Closure-table approach. Yes, an explosion of paths in your path-table, but it does deliver results fast (as opposed to the adjacency model), and keeps updates limited to relevant portions (as opposed to 50% update with nested sets).

Bill Karwin has some nice presentation online about pros and cons of different models, see http://www.slideshare.net/billkarwin/models-for-hierarchical-data (slide 48 is an overview).

152

answered Sep 28 '22 00:09

Wrikken

For DAGs in SQL databases there appeared to be only two solutions:

Recursive WITH clause.
Transitive closure

I'm not aware of any practical graph labeling scheme (like nested sets,intervals or materialized path)

answered Sep 28 '22 01:09

Tegiri Nenashi

Related questions
                            
                                Difference between storing integer or string in database table
                            
                                PetaPoco Insert - Fastest Method?
                            
                                Android, two apps one database?
                            
                                Design for a chat app using Core Data
                            
                                How to check how long MySQL query is taking?
                            
                                Sum query in sqlite in android
                            
                                Using COLLECT STATISTICS in Teradata
                            
                                Best technique to store gender in MySQL Database
                            
                                Postgresql: Trying to get Average of Counts for the last 10 ten days
                            
                                Copy a column to another column within a same table in oracle db. Do I need to specify which data goes where?
                            
                                Count number of unique characters in a string
                            
                                JPA difference between transaction isolation and entity locking
                            
                                Realm: How to get the current size of the database
                            
                                How can I use a text file as database in Python?
                            
                                SQL Server 2005: Wrapping Tables by Views - Pros and Cons
                            
                                What is the best practice for persistence right now?
                            
                                MySQL primary key column type for large tables
                            
                                How can I use and access an SQLite DB using PHP and Wamp Server?
                            
                                How to design table that can be re-sequenced?
                            
                                In SQL Server change column of type int to type text

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With