I have a table representing a graph: Edges(from, to).
I'd like to query this table with "path queries", retrieving only the source and destination of the path.
For example, assume my table consists of the following rows:
+------+----+
| from | to |
+------+----+
| a | b |
| b | c |
| c | d |
| f | g |
| b | f |
| c | a |
+------+----+
Assume I execute the following (pseudo-)query:
SELECT ?v1, ?v2 WHERE ?v1 to ?t1, ?t1 to ?t2, ?t2 to ?v2;
This means I want all the pairs of source and destination that exist in all paths consisting of 4 nodes. Executing this query should return the following results:
+-----+-----+
| ?v1 | ?v2 |
+-----+-----+
| a | a |
| a | g |
| a | d |
+-----+-----+
Of course, paths consisting of a different number of nodes might be needed as well, the number 4 isn't hardcoded :-)
My questions are:
There are no self-edges (E.G "a - a").
There are no two identical rows.
Thanks in advance!
ad 1.) unless you know in advance that your paths will always be of a given length (or a small set of given lengths), you cannot express your query in pure sql. however, you may choose to incrementally maintain the transitive closure of your graph, in particular if
the technique is outlined in a paper by dong et al., doi://10.1.1.103.3314; don't be daunted by the theory and the math, they also provide sql code ready-to-use and their basic ideas are straightforward.
ad 2.)
if maintaining a transitive closure table is an option for you it wpould lend to one index on the pair of columns representing start and end vertices of paths.
if it isn't you might be able exploit the structure of your graph: for average fan-outs that are high (small) in comparison to fan-ins you should be better of with an index on the 'to' ('from') column.
if you cannot make an assumption on fan-out/fan-in ratios you're probably best off with an index on each column.
hope it helps,
best regards, carsten
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With