There are a bunch of both human- and machine-readable textual representations for a tree -- e.g. a nested list (in various representations -- e.g. JSON and YAML) and XML. Combined with indentation, they make it really easy to imagine the resulting structure.
But I don't see anything of the same level of readability for a Directed Acyclic Graph. It's a more general data structure than a tree so the above formats can't be used (verbatim, anyway).
The application I have in mind would be all sorts of flowcharts -- which e.g. naturally emerge in all sorts of planning tasks.
To limit the scope of the question, I'm primarily asking for standard solutions, or at least production-ready and proven to work in some areas of practice. If there are none, any experimental propositions that passed some sort of peer review (e.g. proposed in a published scientific paper) would have to do.
I'm going to go with a YAML nested list with anchors. (Which is equivalent to XML with entities but the latter has more noise.)
(I was already considering it but was wondering if anything better has been invented. Looks like it hasn't. But most importantly, @Patrick87 formally showed that it's an adequate representation.)
It's equivalent to the formal regular expression representation suggested by @Patrick87 if I replace composition with indent and union with no indent; and the anchors allow to eliminate the duplication of a subgraph under a node when it's referenced multiple times.
E.g. @GuyCoder's example
A->B A->C A->D B->E B->F C->E C->G D->F D->G E->H F->H G->H
that corresponds to A(B(E+F)+C(E+G)+D(F+G))H
≡ A(B(EH+FH)+C(EH+GH)+D(FH+GH))
would be
- A
- B
- &E E
- &H H
- &F F
- *H
- C
- *E
- &G G
- *H
- D
- *F
- *G
(For uniformity, every original node could be made an anchor if e.g. generating it.)
I doesn't matter if the graph is planar or not because any cross-cutting links are simply not "drawn".
As a bonus, it allows to specify the data attached to each node, as a hash table rooted in the node. (Though past a certain size, it'll probably be more clear to place the data separately.)
One good textual representation of graphs is the dot language from graphviz. The syntax is easily readable description of relationships.
Note that the core idea behind graphviz
is not to start with a graph and describe it but to not know what the graph looks like and start with what you know then let graphviz
generate the graph for you. Due to this design goal graphviz
does not have any feature for manually placing nodes - it will automatically draw nodes based on the algorithm you select.
Here's an example of what working with graphviz
is like:
Assume you want to draw an org chart of a company. You don't know what the graphs look like yet but you know who reports to who. You can describe the company as follows:
digraph {
CEO -> Board_of_Directors
CTO -> CEO
CFO -> CEO
COO -> CEO
DevLead -> CTO
DevTeam -> DevLead
DevOps -> CTO
Head_of_Accounting -> CFO
Accountants -> Head_of_Accounting
Procurement -> Head_of_Accounting
Procurement -> COO
Logistics -> COO
Tech_support -> CTO
Tech_support -> COO
}
Running this with the dot
algorithm will generate the following graph:
Graphviz does have sophisticated node and edge description features such as defining the shape of the nodes, labels for edges etc. but adding such detail often reduces readability of the graph a bit because the majority of the code now looks like style definitions instead of relationships between nodes. Still, I think the graph definition itself is fairly clean. Like any language, it was intended to be used to solve human problems (in this case figuring out what a graph looks like if you know the states and transitions) and has to be at least somewhat usable for its intended use.
This is a simple variation of Visibility Representations. See: Planar Orthogonal and Polyline Drawing Algorithms Section: 7.2.3 Visibility Representations
A -> B
A -> C
A -> E
B -> D
B -> F
C -> D
C -> G
D -> H
E -> F
E -> G
E -> G
F -> H
G--------G--------G
| | |
| H--H--H |
| | | |
| F---F D---D |
| | | | | |
E-E B--B--B C-C
| | |
A------A------A
Here is this
A -> B
A -> C
A -> D
B -> E
B -> F
C -> E
C -> G
D -> F
D -> G
E -> H
F -> H
G -> H
)|(
represents a bridge, no connection.
H--H--H
| | |
E-------E | G----G--G
| | | | |
| F---)|(-F-)|(---F |
| | | | | |
B--B C--C--C D--D
| | |
A----------A----------A
From: Orthogonal Graph Drawing with Constraints Figure 3.4 (a)
a -> b
a -> c
a -> d
b -> c
b -> e
c -> f
d -> e
d -> f
d -> g
e -> f
e -> h
f -> i
g -> h
g -> i
h -> i
i---i-------i
| | |
f---f---f h---h |
| | | | | |
| | e---e g---g
| | | | |
| d---d--)|(--d
| | |
c---c | |
| | | |
| b-------b |
| | |
a---a-----------a
In formal languages and automata theory, one of the first important results is the equivalence between deterministic finite automata and (formal) regular expressions. A DFA is just a labeled digraph with some extra information. I propose the following: (1) consider all states in the DAG to be accepting (2) label each arc with the label of the vertex where that arc begins (3) produce a regular expression for this DFA (choosing the topologically minimal states as starting states of distinct DFAs). First you'd want to topologically sort the vertices and then make one DFA/regular expression pair per connected component.
Example: node A goes to nodes B and C, B goes do D and E, C goes to E and F.
Topologically sorting finds the vertices in alphabetical order by label are already sorted: A, B, C, D, E. There is one connected component starting at the topologically minimal node A.
Going through the algorithm after labeling arcs might give you a regular expression like A(B(D+E)+C(E+F)).
Note that because the graphs are acyclic, you will never need the Kleene closure symbol asterisk/star. This answer could be elaborated on if there is interest.
EDIT: Some elaboration.
It was pointed out in the comments that there is some duplication in the above regular expression. This is true. However, this may not turn into catastrophic duplication: we can avoid duplicating subgraphs, at least to some degree. For instance, suppose there's a long subgraph with topologically minimal node E in the above example. Let's say it has acceptable regular expression r. Then we could adjust the above regular expression as follows: A((B+C)Er + BD + CF). There is still duplication, now between B and C rather than E, but because of the subgraph with topologically minimal node E, this is still a more concise representation.
Minimizing general regular expressions is PSPACE-complete. However, I don't know whether this bound applies to the minimization of regular expressions generated from DFAs whose graphs are DAGs. As the comments correctly note, the regular theory of DFAs and REs treats general digraphs, which are more complex than DAGs. It's entirely possible that regular expressions without the Kleene star might be easier to minimize than ones with it. This might be a question worth asking separately, possible on cs.stackexchange.com.
Another common way of representing DFAs is by using regular grammars. This is essentially equivalent to simply listing the ordered pairs of nodes corresponding to arcs in the DFA's state graph.
EDIT2: an example
Another answer had this example:
A->B
A->C
A->D
B->E
B->F
C->E
C->G
D->F
D->G
E->H
F->H
G->H
I suspect a nearly minimal regular expression as described here would be approximately the following: A(B(E+F)+C(E+G)+D(F+G))H. Our representation compares as follows:
It's harder to compare to the visibility graph, since the representations are so different, but the regular expression clearly uses fewer total symbols (if we are counting symbols).
EDIT 3: another example was proposed, as follows:
a -> b
a -> c
a -> d
b -> c
b -> e
c -> f
d -> e
d -> f
d -> g
e -> f
e -> h
f -> i
g -> h
g -> i
h -> i
The most concise regular expression I could come up with by hand (second guess, no methodology) is this: a((b+d)e(f+h)+(bc+c+d)f+dg(h+#))i . Note the # is not part of the usual formal regular expression syntax and represents the empty regular expression, that is, it generates the language consisting of the empty string. This is a convenience to allow better minimzation and adds no computational power, only expressive. If you don't like it, you can use dgh + dh instead, it's only one symbol longer. This representation is still around half the size of the original grammar. Actually, the comparisons to both the grammar and the visibility graph are similar w.r.t. those of the last example.
EDIT 4: I will now expand the expression from the last example to show it is a factored representation of a disjoint union of paths through the DAG.
a((b+d)e(f+h)+(bc+c+d)f+dg(h+#))i
a((b+d)e(f+h)+(bc+c+d)f+dgh+dg)i
a((b+d)ef+(b+d)eh+(bc+c+d)f+dgh+dg)i
a(bef+def+beh+deh+(bc+c+d)f+dgh+dg)i
a(bef+def+beh+deh+bcf+cf+df+dgh+dg)i
abefi+adefi+abehi+adehi+abcfi+acfi+adfi+adghi+adgi
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With