Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A human-readable textual representation of a Directed Acycling Graph

There are a bunch of both human- and machine-readable textual representations for a tree -- e.g. a nested list (in various representations -- e.g. JSON and YAML) and XML. Combined with indentation, they make it really easy to imagine the resulting structure.

But I don't see anything of the same level of readability for a Directed Acyclic Graph. It's a more general data structure than a tree so the above formats can't be used (verbatim, anyway).

  • All human-readable representations that I've ever seen were graphical
  • The raw textual representation would be to list all nodes and their connections -- which makes it hard to imagine the graph if there are more than a few nodes

The application I have in mind would be all sorts of flowcharts -- which e.g. naturally emerge in all sorts of planning tasks.


To limit the scope of the question, I'm primarily asking for standard solutions, or at least production-ready and proven to work in some areas of practice. If there are none, any experimental propositions that passed some sort of peer review (e.g. proposed in a published scientific paper) would have to do.

like image 869
ivan_pozdeev Avatar asked Sep 13 '19 15:09

ivan_pozdeev


4 Answers

I'm going to go with a YAML nested list with anchors. (Which is equivalent to XML with entities but the latter has more noise.)
(I was already considering it but was wondering if anything better has been invented. Looks like it hasn't. But most importantly, @Patrick87 formally showed that it's an adequate representation.)

It's equivalent to the formal regular expression representation suggested by @Patrick87 if I replace composition with indent and union with no indent; and the anchors allow to eliminate the duplication of a subgraph under a node when it's referenced multiple times.


E.g. @GuyCoder's example

A->B
A->C
A->D
B->E
B->F
C->E
C->G
D->F
D->G
E->H
F->H
G->H

that corresponds to A(B(E+F)+C(E+G)+D(F+G))HA(B(EH+FH)+C(EH+GH)+D(FH+GH))

would be

- A
    - B
        - &E E
            - &H H
        - &F F
            - *H
    - C
        - *E
        - &G G
            - *H
    - D
        - *F
        - *G

(For uniformity, every original node could be made an anchor if e.g. generating it.)


I doesn't matter if the graph is planar or not because any cross-cutting links are simply not "drawn".

As a bonus, it allows to specify the data attached to each node, as a hash table rooted in the node. (Though past a certain size, it'll probably be more clear to place the data separately.)

like image 197
ivan_pozdeev Avatar answered Oct 07 '22 13:10

ivan_pozdeev


One good textual representation of graphs is the dot language from graphviz. The syntax is easily readable description of relationships.

Note that the core idea behind graphviz is not to start with a graph and describe it but to not know what the graph looks like and start with what you know then let graphviz generate the graph for you. Due to this design goal graphviz does not have any feature for manually placing nodes - it will automatically draw nodes based on the algorithm you select.

Here's an example of what working with graphviz is like:

Assume you want to draw an org chart of a company. You don't know what the graphs look like yet but you know who reports to who. You can describe the company as follows:

digraph {
    CEO                ->  Board_of_Directors
    CTO                ->  CEO
    CFO                ->  CEO
    COO                ->  CEO
    DevLead            ->  CTO
    DevTeam            ->  DevLead
    DevOps             ->  CTO
    Head_of_Accounting ->  CFO
    Accountants        ->  Head_of_Accounting
    Procurement        ->  Head_of_Accounting
    Procurement        ->  COO
    Logistics          ->  COO
    Tech_support       ->  CTO
    Tech_support       ->  COO
}

Running this with the dot algorithm will generate the following graph:

Org Chart

Graphviz does have sophisticated node and edge description features such as defining the shape of the nodes, labels for edges etc. but adding such detail often reduces readability of the graph a bit because the majority of the code now looks like style definitions instead of relationships between nodes. Still, I think the graph definition itself is fairly clean. Like any language, it was intended to be used to solve human problems (in this case figuring out what a graph looks like if you know the states and transitions) and has to be at least somewhat usable for its intended use.

like image 42
slebetman Avatar answered Oct 07 '22 13:10

slebetman


This is a simple variation of Visibility Representations. See: Planar Orthogonal and Polyline Drawing Algorithms Section: 7.2.3 Visibility Representations

A -> B
A -> C
A -> E
B -> D
B -> F
C -> D
C -> G
D -> H
E -> F
E -> G
E -> G
F -> H
 G--------G--------G                
 |        |        |        
 |     H--H--H     |    
 |     |     |     |
 | F---F     D---D |     
 | |   |     |   | |   
 E-E   B--B--B   C-C  
   |      |      |       
   A------A------A

Here is this

A -> B
A -> C
A -> D
B -> E
B -> F
C -> E
C -> G
D -> F
D -> G
E -> H
F -> H
G -> H

)|( represents a bridge, no connection.

          H--H--H          
          |  |  |           
  E-------E  |  G----G--G   
  |       |  |  |       |   
  |  F---)|(-F-)|(---F  |   
  |  |    |     |    |  |   
  B--B    C--C--C    D--D   
  |          |          |   
  A----------A----------A 

From: Orthogonal Graph Drawing with Constraints Figure 3.4 (a)

a -> b
a -> c
a -> d
b -> c
b -> e
c -> f
d -> e
d -> f
d -> g
e -> f
e -> h
f -> i
g -> h
g -> i
h -> i
         i---i-------i   
         |   |       |       
 f---f---f   h---h   |   
 |   |   |   |   |   |   
 |   |   e---e   g---g  
 |   |   |   |   |       
 |   d---d--)|(--d       
 |           |   |        
 c---c       |   |        
 |   |       |   |        
 |   b-------b   |        
 |   |           |        
 a---a-----------a    
like image 21
Guy Coder Avatar answered Oct 07 '22 13:10

Guy Coder


In formal languages and automata theory, one of the first important results is the equivalence between deterministic finite automata and (formal) regular expressions. A DFA is just a labeled digraph with some extra information. I propose the following: (1) consider all states in the DAG to be accepting (2) label each arc with the label of the vertex where that arc begins (3) produce a regular expression for this DFA (choosing the topologically minimal states as starting states of distinct DFAs). First you'd want to topologically sort the vertices and then make one DFA/regular expression pair per connected component.

Example: node A goes to nodes B and C, B goes do D and E, C goes to E and F.

Topologically sorting finds the vertices in alphabetical order by label are already sorted: A, B, C, D, E. There is one connected component starting at the topologically minimal node A.

Going through the algorithm after labeling arcs might give you a regular expression like A(B(D+E)+C(E+F)).

Note that because the graphs are acyclic, you will never need the Kleene closure symbol asterisk/star. This answer could be elaborated on if there is interest.

EDIT: Some elaboration.

It was pointed out in the comments that there is some duplication in the above regular expression. This is true. However, this may not turn into catastrophic duplication: we can avoid duplicating subgraphs, at least to some degree. For instance, suppose there's a long subgraph with topologically minimal node E in the above example. Let's say it has acceptable regular expression r. Then we could adjust the above regular expression as follows: A((B+C)Er + BD + CF). There is still duplication, now between B and C rather than E, but because of the subgraph with topologically minimal node E, this is still a more concise representation.

Minimizing general regular expressions is PSPACE-complete. However, I don't know whether this bound applies to the minimization of regular expressions generated from DFAs whose graphs are DAGs. As the comments correctly note, the regular theory of DFAs and REs treats general digraphs, which are more complex than DAGs. It's entirely possible that regular expressions without the Kleene star might be easier to minimize than ones with it. This might be a question worth asking separately, possible on cs.stackexchange.com.

Another common way of representing DFAs is by using regular grammars. This is essentially equivalent to simply listing the ordered pairs of nodes corresponding to arcs in the DFA's state graph.

EDIT2: an example

Another answer had this example:

A->B
A->C
A->D
B->E
B->F
C->E
C->G
D->F
D->G
E->H
F->H
G->H

I suspect a nearly minimal regular expression as described here would be approximately the following: A(B(E+F)+C(E+G)+D(F+G))H. Our representation compares as follows:

  • 24 symbols in grammar-equivalent, but 11 in regular expression
  • 12 operators in grammar-equivalent, but 8 in regular expression
  • 36 vs 19 total tokens

It's harder to compare to the visibility graph, since the representations are so different, but the regular expression clearly uses fewer total symbols (if we are counting symbols).

EDIT 3: another example was proposed, as follows:

a -> b
a -> c
a -> d
b -> c
b -> e
c -> f
d -> e
d -> f
d -> g
e -> f
e -> h
f -> i
g -> h
g -> i
h -> i

The most concise regular expression I could come up with by hand (second guess, no methodology) is this: a((b+d)e(f+h)+(bc+c+d)f+dg(h+#))i . Note the # is not part of the usual formal regular expression syntax and represents the empty regular expression, that is, it generates the language consisting of the empty string. This is a convenience to allow better minimzation and adds no computational power, only expressive. If you don't like it, you can use dgh + dh instead, it's only one symbol longer. This representation is still around half the size of the original grammar. Actually, the comparisons to both the grammar and the visibility graph are similar w.r.t. those of the last example.

EDIT 4: I will now expand the expression from the last example to show it is a factored representation of a disjoint union of paths through the DAG.

a((b+d)e(f+h)+(bc+c+d)f+dg(h+#))i
a((b+d)e(f+h)+(bc+c+d)f+dgh+dg)i
a((b+d)ef+(b+d)eh+(bc+c+d)f+dgh+dg)i
a(bef+def+beh+deh+(bc+c+d)f+dgh+dg)i
a(bef+def+beh+deh+bcf+cf+df+dgh+dg)i
abefi+adefi+abehi+adehi+abcfi+acfi+adfi+adghi+adgi
like image 1
Patrick87 Avatar answered Oct 07 '22 11:10

Patrick87