I am interested in persisting individual directed graphs. This question is not asking for a full-scale graph database solution, but for a document format that I can use to save and individual arbitrary directed graph. I don't know what notation and file format would be the smartest choice. My primary concerns are: <ol> <li>Expressiveness/Flexibility - I want the ability to express graphs of different types. While the standard use case would be a simple directed graph, it should be possible to express trees, cyclical graphs, multi-graphs. As a bare minimum, I would expect support for labeling and weighting of edges and nodes. Notations for describing higraphs and edge composition/hyper-edges would also be highly desirable, although I am aware that such solutions may not exist. </li> <li>Type System-Independence - I am interested in representing the structural qualities of graphs. Some solutions include an extensible type system for typed edges and nodes (e.g. RDF/OWL). I would only be interested in such a representation, if there were a clearly defined canonical decomposition of typed elements into primitives (nodes/edges/attributes). What I am trying to avoid here is the ability for multiple representations of equivalent graphs, where the equivalence is not discernible.</li> <li>Canonical Representation - There should be a mechanism that allows the graph to be represented canonically (in such a way that lexical equivalence of canonical-representations could be used to determine equivalence).</li> <li>Presentation Independent - I would prefer a notation that is not dependent upon the presentation of the graph. This would include spatial orientation, colors, font, etc. I am only interested in representing the data. One of the features I don't like about DOT language, DGML or SVG (at least for this particular purpose) is the focus on visual representation.</li> <li>Standardized / Open / Compatible - The less implementation work that I have to do, the better. If the format is standardized and reliable tools already exist for working with the format, then it is more preferable. Accompanying this requirement is another, that the format should be highly-compatible. The proprietary nature of Microsoft's DGML is a reason for my aversion, despite the Visual Studio tooling and the fact that I work primarily with .NET (now). The fact that W3C publishes RDF standards is a motivation for considering a limited subset of RDF as a representational tool. I also appreciate GXL and GraphML, because they have well documented xml schemas, thereby facilitating the ability to integrate their data with any xml-compatible software package.</li> <li>Simplicity / Readability - I appreciate human-readable syntax and ease of interpretation. I also appreciate representation that simplifies parsing. For this reason, I like GML, but I am concerned it is not mainstream enough to be a realistic choice. I would also consider JSON or YAML for readability, if they were not so limited in their respective abilities to represent complex (non-DAG) structures.</li> <li>Efficiency / Concise Representation - It's worth considering that whatever format I end up choosing will inevitably have to be persisted and transferred over some network. Therefore, file size is a relevant consideration. </li> </ol> <h3>Overview</h3> I recognize that I will most likely be unable to find a solution that satisfies every criteria on my wishlist. I am simply asking for the file format that is closest to what I want and that doesn't limit extensibility for unsupported use cases.

My thoughts: <ul> <li>What I'm missing is your particular practical purpose/domain.</li> <li>You mention the generic JSON format next to specific formats (e.g. GraphML which is an application of XML). So I'm left with the question if you do or don't consider making your own format.</li> <li>Wouldn't having a 'canonical representation that can be used to determine equivalence' solve the graph isomorphism problem?</li> <li>GraphML seems to cover a lot of your theoretical requirements, so I'd suggest you create a JSON version of this. This would then also cover requirement 6.</li> <li>Then, you could create a converter between the JSON format and GraphML (and possibly other formats).</li> <li>For your requirement 7 it again all depends on the practical graph sizes. I mean, nowadays sending up to a few MB to a friggin mobile device is not considered much. A graph of a few MB in (about) any format you mention, is already a relatively large beast with tens of thousands of nodes & edges.</li> </ul>

What about Trivial Graph Format:

What is the most standard file format and notation for persisting expressive directed graphs?

Tags:

data-structures

directed-graph

file-format

rdf

graphml

I am interested in persisting individual directed graphs. This question is not asking for a full-scale graph database solution, but for a document format that I can use to save and individual arbitrary directed graph. I don't know what notation and file format would be the smartest choice.

My primary concerns are:

Expressiveness/Flexibility - I want the ability to express graphs of different types. While the standard use case would be a simple directed graph, it should be possible to express trees, cyclical graphs, multi-graphs. As a bare minimum, I would expect support for labeling and weighting of edges and nodes. Notations for describing higraphs and edge composition/hyper-edges would also be highly desirable, although I am aware that such solutions may not exist.
Type System-Independence - I am interested in representing the structural qualities of graphs. Some solutions include an extensible type system for typed edges and nodes (e.g. RDF/OWL). I would only be interested in such a representation, if there were a clearly defined canonical decomposition of typed elements into primitives (nodes/edges/attributes). What I am trying to avoid here is the ability for multiple representations of equivalent graphs, where the equivalence is not discernible.
Canonical Representation - There should be a mechanism that allows the graph to be represented canonically (in such a way that lexical equivalence of canonical-representations could be used to determine equivalence).
Presentation Independent - I would prefer a notation that is not dependent upon the presentation of the graph. This would include spatial orientation, colors, font, etc. I am only interested in representing the data. One of the features I don't like about DOT language, DGML or SVG (at least for this particular purpose) is the focus on visual representation.
Standardized / Open / Compatible - The less implementation work that I have to do, the better. If the format is standardized and reliable tools already exist for working with the format, then it is more preferable. Accompanying this requirement is another, that the format should be highly-compatible. The proprietary nature of Microsoft's DGML is a reason for my aversion, despite the Visual Studio tooling and the fact that I work primarily with .NET (now). The fact that W3C publishes RDF standards is a motivation for considering a limited subset of RDF as a representational tool. I also appreciate GXL and GraphML, because they have well documented xml schemas, thereby facilitating the ability to integrate their data with any xml-compatible software package.
Simplicity / Readability - I appreciate human-readable syntax and ease of interpretation. I also appreciate representation that simplifies parsing. For this reason, I like GML, but I am concerned it is not mainstream enough to be a realistic choice. I would also consider JSON or YAML for readability, if they were not so limited in their respective abilities to represent complex (non-DAG) structures.
Efficiency / Concise Representation - It's worth considering that whatever format I end up choosing will inevitably have to be persisted and transferred over some network. Therefore, file size is a relevant consideration.

Overview

I recognize that I will most likely be unable to find a solution that satisfies every criteria on my wishlist. I am simply asking for the file format that is closest to what I want and that doesn't limit extensibility for unsupported use cases.

998

asked May 03 '13 17:05

smartcaveman

3 Answers

ObWindyPreamble: in the RDF world, there are a gazillion different surface syntax formats to choose from. RDF itself is an abstract metamodel for data, not directly a "graph syntax". You can of course directly represent a graph in RDF (since RDF models are graphs), but given that you want to represent different kinds of graphs you may end up with having to abstract away, and actually create an RDF vocabulary for representing different types of graphs.

All in all, I'm not convinced that RDF is the best way to go for you, but if you'd choose one, I'd say that RDF's Turtle syntax is something worth looking into. It certainly ticks the readability and simplicity boxes, as well as being a standard (well, almost... W3C is working on standardizing it) and having wide (open-source) tool support.

RDF models roughly follow set semantics, which means that a canonical syntax representation can not really be enforced: two files can have information in a different order without it affecting the actual model, or even can contain duplicate information. However, if you enforce a simple sorting algorithm when producing files (something for which most RDF parsers/writers have support), you should be able to get away with doing line-based comparisons and determining graph equivalence based on surface syntax.

Just as a simple example, let's assume we have a very simple, directed, labeled graph:

 A ---r1---> B ---r2---> C

You could represent this directly in RDF, as follows (using Turtle syntax):

 @prefix : <http://example.org/> .

 :A :r1 :B .
 :B :r2 :C .

In a more abstract modeling, you could do something like this:

 @prefix g: <http://example.org/graph-model/> .
 @prefix : <http://example.org/> .

 :A a g:Vertex .
 :B a g:Vertex .
 :C a g:Vertex .

 :r1 a g:DirectedEdge ;
     g:from :A ;
     g:to :B .
 :r2 a g:DirectedEdge ;
     g:from :B ;
     g:to :C .

The above is just a simplistic example of course, but hopefully it illustrates that this potentially meets quite a few of the things on your wish list.

By the way, if you want even simpler, N-Triples is also an RDF syntax, which is line-based and therefore easy to process in a streaming fashion. It's slightly more verbose than Turtle but it may make file comparison easier.

answered Oct 31 '22 10:10

Jeen Broekstra

My thoughts:

What I'm missing is your particular practical purpose/domain.
You mention the generic JSON format next to specific formats (e.g. GraphML which is an application of XML). So I'm left with the question if you do or don't consider making your own format.
Wouldn't having a 'canonical representation that can be used to determine equivalence' solve the graph isomorphism problem?
GraphML seems to cover a lot of your theoretical requirements, so I'd suggest you create a JSON version of this. This would then also cover requirement 6.
Then, you could create a converter between the JSON format and GraphML (and possibly other formats).
For your requirement 7 it again all depends on the practical graph sizes. I mean, nowadays sending up to a few MB to a friggin mobile device is not considered much. A graph of a few MB in (about) any format you mention, is already a relatively large beast with tens of thousands of nodes & edges.

answered Oct 31 '22 10:10

meaning-matters

What about Trivial Graph Format:

answered Oct 31 '22 09:10

Julian Borrero

Related questions
                            
                                Huge Graph Structure
                            
                                How to build an and-or tree?
                            
                                Binary Search Tree Destructor
                            
                                Array Access Complexity
                            
                                Sum array of arrays (matrix) vertically efficiently/elegantly
                            
                                How to save the memory when storing color information in Red-Black Trees?
                            
                                Inserting into Sorted LinkedList Java
                            
                                Delete a node in singly link list
                            
                                given a node how can I find previous node in a singly linked list
                            
                                Fastest immutable list data structure for lots of concatenation and a single iteration
                            
                                How to implement a double linked list with only one pointer?
                            
                                LinkedList remove method
                            
                                Best data structure for an immutable persistent 3D grid
                            
                                Data structure for selecting groups of machines
                            
                                Most common element in an array / Finding the relative majority, deterministically in O(n) time and O(1) space?
                            
                                Generating suffix tree of string S[2..m] from suffix tree of string S[1..m]
                            
                                Print Specific nodes at a every level calculated by a given function
                            
                                Interview Question: Data structure for a large social network [closed]
                            
                                Recommended data structure for 1 million+ ordered collection in .NET 3.5

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With