Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Neo4j time dependent graph model

I need help with the model of my neo4j graph structure for a time dependent domain. See the following sketch for the requirement and problem:

Problem sktech

  • Picture 1 & 2: For each day I have nodes and relationships between them. I define the relationship as co-occurrence between two nodes (e.g words) in some lexical unit (sentences). The same node can occur on several days with new nodes or already existing once. See the following example, where we only consider named entities for nodes:

    • 2013/01/01: Peter was wondering about Cassandra tonight.
    • 2013/01/01: Cassandra wants to stay at home with Peter.
    • ....
    • 2013/01/08: Peter was fallen in love with Judith.
    • 2013/01/08: Cassandra drives Peter to school every day.

    This will result in the graph structure below.

     - 2013/01/01:
    
        (Peter) <--2--> (Cassandra)
    
     - 2013/01/08
    
        (Peter) <--1--> (Judith)
    
        (Peter) <--1--> (Cassandra)
    
  • Picture 3: The graph structure should support to select a certain time span and get a path from a starting point (P1) to an end point (P2). Here the path is given by the max flow between those two nodes with respect to the accumulated nodes and relations for the specific time span.

  • Picture 4: It should also be possible to expand nodes according to e.g the highest remaining edge weight. Picture 4 shows the expanded graph with 3 additional nodes.

I already know this work 2 and the multi-level index 3 example. The first model do not support good path finding between nodes from different frames. Only the latter one will be helpful for querying time ranges. Hope somebody can help.

Regards.

like image 572
user2715478 Avatar asked Jun 06 '14 18:06

user2715478


1 Answers

There are numerous ways to model time in a graph. One way is adding a timestamp, or even start/end time of the period in which the relation was valid. That way, you can query the graph to return subgraphs, or paths, which were valid at a given time.

Ian Robinson (one of the authors of the Graph Databases book) has written a very good blog post about this topic: http://iansrobinson.com/2014/05/13/time-based-versioned-graphs/

Regarding performance, it is true that accessing relationships is a bit more expensive than querying only by relationship type, but you probably will need to benchmark for yourself, with your own data set, so I would suggest to start with the simplest model which works for you, and then optimize performance iteratively, if necessary.

like image 105
Axel Morgner Avatar answered Sep 21 '22 00:09

Axel Morgner