Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to efficiently query a directed acyclic graph

I'm using mysql for one on my web applications. The application table contains a supervisor table and an employee table. employee table contains information about each employee. The supervisor table contains two columns as follows.

supervisor_id -> which is employee id of the supervisor
subordinate_id -> which is the employee id of the subordinate. 

Each subordinate can have multiple supervisors and one supervisors subordinates can be a supervisor of of some other employee. So the table records can be ass follows.

supervisor_id | subordinate_id
1             | 2
1             | 3
2             | 4
4             | 5
3             | 6
3             | 4

In the above example there is a supervisor chain. Supervisor 1 has 2, 3, 4, 5 and 6 as his subordinates. Supervisor 2 has 4, 5 as subordinates. And also it can have multiple supervisors for a subordinate.

When I querying for all subordinates for supervisor 2 currently I use a query like following.

public function getSubordinate($id) {
 $query = "SELECT * FROM supervisor WHERE subordinate_id = $id";
 // get results and return
}

So what I'm currently do is first send the id as 2 to get its immediate subordinates. Then for each and every resulting subordinates I run the query again and again to get the full subordinate chain.

This is okay with small set of data. But this supervisor table will have thousands of data so I have to do thousands of queries to find the supervisor chain and it takes times to give results.

Since subordinates can have multiple supervisors the nested set will not be a exact answer for this.

I went through this solution also . http://www.codeproject.com/Articles/22824/A-Model-to-Represent-Directed-Acyclic-Graphs-DAG-o

But when I use this method it will have millions of data with that table. and it is inefficient.

My problem is there any efficient way to do this. Is there any problem with my table structure which prevent me to do this kind of query efficiently.

like image 268
Thilanka Avatar asked Jun 22 '12 08:06

Thilanka


People also ask

Is there any tool to make and analyze directed acyclic graphs DAG )?

You can use optaplanner for the same.

How do you check if a graph is directed acyclic?

If a vertex can reach itself via a nontrivial path (a path with one or more edges), then that path is a cycle, so another way to define directed acyclic graphs is that they are the graphs in which no vertex can reach itself via a nontrivial path.

How do you find shortest path in directed acyclic graph explain with an example?

Given a Weighted Directed Acyclic Graph and a source vertex in the graph, find the shortest paths from given source to all other vertices. For a general weighted graph, we can calculate single source shortest distances in O(VE) time using Bellman–Ford Algorithm.

How does DAG help in optimization?

To apply an optimization technique to a basic block, a DAG is a three-address code that is generated as the result of an intermediate code generation. Directed acyclic graphs are a type of data structure and they are used to apply transformations to basic blocks.


1 Answers

All of the major database applications (including MySQL and MariaDB) now support recursive queries using Common Table Expressions. This was introduced in MySQL version 8.0 and MariaDB version 10.2.2. PostgreSQL had support even earlier. Oracle has it, and SQL Server added it with the 2005 version. In fact, a quick search suggests that Sqlite supports Common Table Expressions too.

Therefore, the answer you may be looking for is to use Common Table Expressions and recursive queries. Somes reasons why that is considered a better solution compared to "A Model to Represent Directed Acyclic Graphs (DAG) on SQL Databases" are explained here:

Encoding and Querying Graphs in the Relational Model
https://drtom.ch/posts/2012/02/11/Encoding_and_Querying_Graphs_in_the_Relational_Model/

(You can ignore the part where he says, "it won't run in particular on MySQL or sqlite3, which simply do not support CTEs." As I mentioned, that's no longer the case.)

As you noted in your question, "when I use this method it will have millions of data with that table." That alone might not be so bad if you were trading space for efficiency all around, but as Dr. Tom's post explains in one example:

The operation of deleting or inserting the red arc requires effort in θ(n^2) too.

That's n-squared effort for those operations; you gain query efficiency, but at the cost of space inefficiency and insertion/deletion inefficiency. He further goes on to point out that

practically all large real world networks are sparse. They have much fewer edges than there would be possible, i.e. m«n^2.

In fairness, the Code Project article by Kemal Erdogan that you linked is from 2008; CTE's were not universally available then. Furthermore, Erdogan made an informed choice in regard to the trade-off's as he explained here:

The solution that I have is based on a recursion [too]. However, instead of deferring the recursion until query time, I do the recursion at the insertion time, assuming that the graph is actually more queried than modified (which is true for all the cases that I have faced so far).

If, after reading Dr. Tom's article, you ultimately prefer Erdogan's trade-offs, you may be able to limit the other inefficiencies by taking a look at Laravel's implementation here:

GitHub - telkins/laravel-dag-manager: A SQL-based Directed Acyclic Graph (DAG) solution for Laravel. https://github.com/telkins/laravel-dag-manager

In particular, look at Max Hops and implement something like that in your own solution.

This is in the Laravel config file:

/*
|--------------------------------------------------------------------------
| Max Hops
|--------------------------------------------------------------------------
|
| This value represents the maximum number of hops that are allowed where
| hops "[i]ndicates how many vertex hops are necessary for the path; it is
| zero for direct edges".
|
| The more hops that are allowed (and used), then the more DAG edges will
| be created.  This will have an increasing impact on performance, space,
| and memory.  Whether or not it's negligible, noticeable, or impactful
| depends on a variety of factors.
*/

'max_hops' => 5,

Disclaimer: I am only now researching this for myself. I do not yet have the benefit of experience with any of these solutions.

like image 195
MountainX Avatar answered Nov 07 '22 05:11

MountainX