Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to serialize a graph structure?

People also ask

What serialized graphs?

An object graph contains a set of objects that are automatically serialized given that the object that contains the reference is serialized too. Any object that is serialized and contains an object reference, the object reference will be serialized by the JVM.

What is serialized in data structure?

In computing, serialization (US and Oxford spelling) or serialisation (UK spelling) is the process of translating a data structure or object state into a format that can be stored (for example, in a file or memory data buffer) or transmitted (for example, over a computer network) and reconstructed later (possibly in a ...

What is the process of serialization?

Serialization is the process of converting an object into a stream of bytes to store the object or transmit it to memory, a database, or a file. Its main purpose is to save the state of an object in order to be able to recreate it when needed. The reverse process is called deserialization.

What are two popular methods of data serialization?

XML , JSON , BSON, YAML , MessagePack, and protobuf are some commonly used data serialization formats.


How do you represent your graph in memory?
Basically you have two (good) options:

  • an adjacency list representation
  • an adjacency matrix representation

in which the adjacency list representation is best used for a sparse graph, and a matrix representation for the dense graphs.

If you used suchs representations then you could serialize those representations instead.

If it has to be human readable you could still opt for creating your own serialization algorithm. For example you could write down the matrix representation like you would do with any "normal" matrix: just print out the columns and rows, and all the data in it like so:

   1  2  3
1 #t #f #f
2 #f #f #t
3 #f #t #f

(this is a non-optimized, non weighted representation, but can be used for directed graphs)


Typically relationships in XML are shown by the parent/child relationship. XML can handle graph data but not in this manner. To handle graphs in XML you should use the xs:ID and xs:IDREF schema types.

In an example, assume that node/@id is an xs:ID type and that link/@ref is an xs:IDREF type. The following XML shows the cycle of three nodes 1 -> 2 -> 3 -> 1.

<data>
  <node id="1"> 
    <link ref="2"/>
  </node>
  <node id="2">
    <link ref="3"/>
  </node>
  <node id="3">
    <link ref="1"/>
  </node>
</data>

Many development tools have support for ID and IDREF too. I have used Java's JAXB (Java XML Binding. It supports these through the @XmlID and the @XmlIDREF annotations. You can build your graph using plain Java objects and then use JAXB to handle the actual serialization to XML.


XML is very verbose. Whenever I do it, I roll my own. Here's an example of a 3 node directed acyclic graph. It's pretty compact and does everything I need it to do:

0: foo
1: bar
2: bat
----
0 1
0 2
1 2

One example you might be familiar is Java serialization. This effectively serializes by graph, with each object instance being a node, and each reference being an edge. The algorithm used is recursive, but skipping duplicates. So the pseudo code would be:

serialize(x):
    done - a set of serialized objects
    if(serialized(x, done)) then return
    otherwise:
         record properties of x
         record x as serialized in done
         for each neighbour/child of x: serialize(child)

Another way of course is as a list of nodes and edges, which can be done as XML, or in any other preferred serialization format, or as an adjacency matrix.


Adjacency lists and adjacency matrices are the two common ways of representing graphs in memory. The first decision you need to make when deciding between these two is what you want to optimize for. Adjacency lists are very fast if you need to, for example, get the list of a vertex's neighbors. On the other hand, if you are doing a lot of testing for edge existence or have a graph representation of a markov chain, then you'd probably favor an adjacency matrix.

The next question you need to consider is how much you need to fit into memory. In most cases, where the number of edges in the graph is much much smaller than the total number of possible edges, an adjacency list is going to be more efficient, since you only need to store the edges that actually exist. A happy medium is to represent the adjacency matrix in compressed sparse row format in which you keep a vector of the non-zero entries from top left to bottom right, a corresponding vector indicating which columns the non-zero entries can be found in, and a third vector indicating the start of each row in the column-entry vector.

[[0.0, 0.0, 0.3, 0.1]
 [0.1, 0.0, 0.0, 0.0]
 [0.0, 0.0, 0.0, 0.0]
 [0.5, 0.2, 0.0, 0.3]]

can be represented as:

vals: [0.3, 0.1, 0.1, 0.5, 0.2, 0.3]
cols: [2,   3,   0,   0,   1,   4]
rows: [0,        2, null,  4]

Compressed sparse row is effectively an adjacency list (the column indices function the same way), but the format lends itself a bit more cleanly to matrix operations.