I have a data modeling question. The data that I have is basically nodes with relations to other nodes. Nodes have properties. Edges are directional and have properties. I am exploring if a Graph DB like Neo4j will be appropriate or not.
The doubt is because: The data that I have is time based. It changes on the basis of time, and I need to keep track of the historical data as well. For example, I should be able to query:
I searched but couldn't find a satisfactory resource where I could understand how time can be factored into a Graph DB. Do you think my requirement can be inherently met using a Graph DB? Is there an example/resource/article which describes this for Neo4j or any other graph db?
I want to make sure that the database is scalable to about 100K nodes, and millions of edges. I am optimizing for time over space.
Is there an example/resource/article which describes this for Neo4j or any other graph db?
Here is an excellent article from Ian Robinson blog about time-based versioned graphs.
Basically the article describes a way to represent a time-based versioned graphs adding some extra nodes and timestamped relationships to represent the state of the graph in a given timestamp.
The following image from the referenced article shows:
produc_id : 1
has changed from 1.00 to 2.00. This is a state change.product_id : 1
is now sold by shop_id : 2
(and not by shop_id : 1
). This is a structural change.Do you think my requirement can be inherently met using a Graph DB?
Yes, but not in an easy or "natural" way. Versioning a time based model with a database that don't offers this functionality natively can be hard and expensive. From the article:
Neo4j doesn’t provide intrinsic support either at the level of its labelled property graph model or in its Cypher query language for versioning. Therefore, to version a graph we need to make our application graph data model and queries version aware.
and
versioning necessarily creates a lot more data – both more nodes and more relationships. In addition, queries will tend to be more complex, and slower, because every MATCH must take account of one or more versioned elements. Given these overheads, apply versioning with care. Perhaps not all of your graph needs to be versioned. If that’s the case, version only those portions of the graph that require it.
EDIT:
A few words from the book Graph Databases (by Ian Robinson, Jim Webber and Emil Eifrem) about versioning in graph databases. This book is available for download at Neo4J page:
Versioning: A versioned graph enables us to recover the state of the graph at a particular point in time. Most graph databases don’t support versioning as a first-class concept. It is possible, however, to create a versioning scheme inside the graph model. With this scheme nodes and relationships are timestamped and archived whenever they are modified The downside of such versioning schemes is that they leak into any queries written against the graph, adding a layer of complexity to even the simplest query.
This paragraph links the article indicated in the beginning of this answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With