Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the most standard file format and notation for persisting expressive directed graphs?

I am interested in persisting individual directed graphs. This question is not asking for a full-scale graph database solution, but for a document format that I can use to save and individual arbitrary directed graph. I don't know what notation and file format would be the smartest choice.

My primary concerns are:

  1. Expressiveness/Flexibility - I want the ability to express graphs of different types. While the standard use case would be a simple directed graph, it should be possible to express trees, cyclical graphs, multi-graphs. As a bare minimum, I would expect support for labeling and weighting of edges and nodes. Notations for describing higraphs and edge composition/hyper-edges would also be highly desirable, although I am aware that such solutions may not exist.

  2. Type System-Independence - I am interested in representing the structural qualities of graphs. Some solutions include an extensible type system for typed edges and nodes (e.g. RDF/OWL). I would only be interested in such a representation, if there were a clearly defined canonical decomposition of typed elements into primitives (nodes/edges/attributes). What I am trying to avoid here is the ability for multiple representations of equivalent graphs, where the equivalence is not discernible.

  3. Canonical Representation - There should be a mechanism that allows the graph to be represented canonically (in such a way that lexical equivalence of canonical-representations could be used to determine equivalence).

  4. Presentation Independent - I would prefer a notation that is not dependent upon the presentation of the graph. This would include spatial orientation, colors, font, etc. I am only interested in representing the data. One of the features I don't like about DOT language, DGML or SVG (at least for this particular purpose) is the focus on visual representation.

  5. Standardized / Open / Compatible - The less implementation work that I have to do, the better. If the format is standardized and reliable tools already exist for working with the format, then it is more preferable. Accompanying this requirement is another, that the format should be highly-compatible. The proprietary nature of Microsoft's DGML is a reason for my aversion, despite the Visual Studio tooling and the fact that I work primarily with .NET (now). The fact that W3C publishes RDF standards is a motivation for considering a limited subset of RDF as a representational tool. I also appreciate GXL and GraphML, because they have well documented xml schemas, thereby facilitating the ability to integrate their data with any xml-compatible software package.

  6. Simplicity / Readability - I appreciate human-readable syntax and ease of interpretation. I also appreciate representation that simplifies parsing. For this reason, I like GML, but I am concerned it is not mainstream enough to be a realistic choice. I would also consider JSON or YAML for readability, if they were not so limited in their respective abilities to represent complex (non-DAG) structures.

  7. Efficiency / Concise Representation - It's worth considering that whatever format I end up choosing will inevitably have to be persisted and transferred over some network. Therefore, file size is a relevant consideration.

Overview

I recognize that I will most likely be unable to find a solution that satisfies every criteria on my wishlist. I am simply asking for the file format that is closest to what I want and that doesn't limit extensibility for unsupported use cases.

like image 998
smartcaveman Avatar asked May 03 '13 17:05

smartcaveman


People also ask

What is a GraphML file?

What is GraphML? GraphML is a comprehensive and easy-to-use file format for graphs. It consists of a language core to describe the structural properties of a graph and a flexible extension mechanism to add application-specific data.

How do I open a .graph file?

If you cannot open your GRAPH file correctly, try to right-click or long-press the file. Then click "Open with" and choose an application. You can also display a GRAPH file directly in the browser: Just drag the file onto this browser window and drop it.


3 Answers

ObWindyPreamble: in the RDF world, there are a gazillion different surface syntax formats to choose from. RDF itself is an abstract metamodel for data, not directly a "graph syntax". You can of course directly represent a graph in RDF (since RDF models are graphs), but given that you want to represent different kinds of graphs you may end up with having to abstract away, and actually create an RDF vocabulary for representing different types of graphs.

All in all, I'm not convinced that RDF is the best way to go for you, but if you'd choose one, I'd say that RDF's Turtle syntax is something worth looking into. It certainly ticks the readability and simplicity boxes, as well as being a standard (well, almost... W3C is working on standardizing it) and having wide (open-source) tool support.

RDF models roughly follow set semantics, which means that a canonical syntax representation can not really be enforced: two files can have information in a different order without it affecting the actual model, or even can contain duplicate information. However, if you enforce a simple sorting algorithm when producing files (something for which most RDF parsers/writers have support), you should be able to get away with doing line-based comparisons and determining graph equivalence based on surface syntax.

Just as a simple example, let's assume we have a very simple, directed, labeled graph:

 A ---r1---> B ---r2---> C

You could represent this directly in RDF, as follows (using Turtle syntax):

 @prefix : <http://example.org/> .

 :A :r1 :B .
 :B :r2 :C .

In a more abstract modeling, you could do something like this:

 @prefix g: <http://example.org/graph-model/> .
 @prefix : <http://example.org/> .

 :A a g:Vertex .
 :B a g:Vertex .
 :C a g:Vertex .

 :r1 a g:DirectedEdge ;
     g:from :A ;
     g:to :B .
 :r2 a g:DirectedEdge ;
     g:from :B ;
     g:to :C .

The above is just a simplistic example of course, but hopefully it illustrates that this potentially meets quite a few of the things on your wish list.

By the way, if you want even simpler, N-Triples is also an RDF syntax, which is line-based and therefore easy to process in a streaming fashion. It's slightly more verbose than Turtle but it may make file comparison easier.

like image 71
Jeen Broekstra Avatar answered Oct 31 '22 10:10

Jeen Broekstra


My thoughts:

  • What I'm missing is your particular practical purpose/domain.

  • You mention the generic JSON format next to specific formats (e.g. GraphML which is an application of XML). So I'm left with the question if you do or don't consider making your own format.

  • Wouldn't having a 'canonical representation that can be used to determine equivalence' solve the graph isomorphism problem?

  • GraphML seems to cover a lot of your theoretical requirements, so I'd suggest you create a JSON version of this. This would then also cover requirement 6.

  • Then, you could create a converter between the JSON format and GraphML (and possibly other formats).

  • For your requirement 7 it again all depends on the practical graph sizes. I mean, nowadays sending up to a few MB to a friggin mobile device is not considered much. A graph of a few MB in (about) any format you mention, is already a relatively large beast with tens of thousands of nodes & edges.

like image 37
meaning-matters Avatar answered Oct 31 '22 10:10

meaning-matters


What about Trivial Graph Format:

like image 41
Julian Borrero Avatar answered Oct 31 '22 09:10

Julian Borrero