Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Model Real-World Relationships in a Graph Database (like Neo4j)?

I have a general question about modeling in a graph database that I just can't seem to wrap my head around.

How do you model this type of relationship: "Newton invented Calculus"?

In a simple graph, you could model it like this:

Newton (node) -> invented (relationship) -> Calculus (node)

...so you'd have a bunch of "invented" graph relationships as you added more people and inventions.

The problem is, you start needing to add a bunch of properties to the relationship:

  • invention_date
  • influential_concepts
  • influential_people
  • books_inventor_wrote

...and you'll want to start creating relationships between those properties and other nodes, such as:

  • influential_people: relationship to person nodes
  • books_inventor_wrote: relationship to book nodes

So now it seems like the "real-world relationships" ("invented") should actually be a node in the graph, and the graph should look like this:

Newton (node) -> (relationship) -> Invention of Calculus (node) -> (relationship) -> Calculus (node)

And to complicate things more, other people are also participated in the invention of Calculus, so the graph now becomes something like:

Newton (node) -> 
  (relationship) -> 
    Newton's Calculus Invention (node) -> 
      (relationship) -> 
        Invention of Calculus (node) -> 
          (relationship) -> 
            Calculus (node)
Leibniz (node) -> 
  (relationship) -> 
    Leibniz's Calculus Invention (node) -> 
      (relationship) -> 
        Invention of Calculus (node) -> 
          (relationship) -> 
            Calculus (node)

So I ask the question because it seems like you don't want to set properties on the actual graph database "relationship" objects, because you may want to at some point treat them as nodes in the graph.

Is this correct?

I have been studying the Freebase Metaweb Architecture, and they seem to be treating everything as a node. For example, Freebase has the idea of a Mediator/CVT, where you can create a "Performance" node that links an "Actor" node to a "Film" node, like here: http://www.freebase.com/edit/topic/en/the_last_samurai. Not quite sure if this is the same issue though.

What are some guiding principles you use to figure out if the "real-world relationship" should actually be a graph node rather than a graph relationship?

If there are any good books on this topic I would love to know. Thanks!

like image 883
Lance Avatar asked Sep 24 '11 00:09

Lance


People also ask

How are relationships stored in Neo4j?

Properties are stored as a linked list of property records, each holding a key and value and pointing to the next property. Each node and relationship references its first property record. The Nodes also reference the first relationship in its relationship chain. Each Relationship references its start and end node.

What is relationship in graph database?

The relationships allow data in the store to be linked together directly and, in many cases, retrieved with one operation. Graph databases hold the relationships between data as a priority. Querying relationships is fast because they are perpetually stored in the database.

What is a relationship in Neo4j?

The Neo4j property graph database model consists of: Nodes describe entities (discrete objects) of a domain. Nodes can have zero or more labels to define (classify) what kind of nodes they are. Relationships describes a connection between a source node and a target node.

What is a graph database Neo4j?

Neo4j is a graph database. A graph database, instead of having rows and columns has nodes edges and properties. It is more suitable for certain big data and analytics applications than row and column databases or free-form JSON document databases for many use cases. A graph database is used to represent relationships.


1 Answers

Some of these things, such as invention_date, can be stored as properties on the edges as in most graph databases edges can have properties in the same way that vertexes can have properties. For example you could do something like this (code follows TinkerPop's Blueprints):

Graph graph = new Neo4jGraph("/tmp/my_graph");
Vertex newton = graph.addVertex(null);
newton.setProperty("given_name", "Isaac");
newton.setProperty("surname", "Newton");
newton.setProperty("birth_year", 1643); // use Gregorian dates...
newton.setProperty("type", "PERSON");

Vertex calculus = graph.addVertex(null);
calculus.setProperty("type", "KNOWLEDGE");

Edge newton_calculus = graph.addEdge(null, newton, calculus, "DISCOVERED");
newton_calculus.setProperty("year", 1666);   

Now, lets expand it a little bit and add in Liebniz:

Vertex liebniz = graph.addVertex(null);
liebniz.setProperty("given_name", "Gottfried");
liebniz.setProperty("surnam", "Liebniz");
liebniz.setProperty("birth_year", "1646");
liebniz.setProperty("type", "PERSON");

Edge liebniz_calculus = graph.addEdge(null, liebniz, calculus, "DISCOVERED");
liebniz_calculus.setProperty("year", 1674);

Adding in the books:

Vertex principia = graph.addVertex(null);
principia.setProperty("title", "Philosophiæ Naturalis Principia Mathematica");
principia.setProperty("year_first_published", 1687);
Edge newton_principia = graph.addEdge(null, newton, principia, "AUTHOR");
Edge principia_calculus = graph.addEdge(null, principia, calculus, "SUBJECT");

To find out all of the books that Newton wrote on things he discovered we can construct a graph traversal. We start with Newton, follow the out links from him to things he discovered, then traverse links in reverse to get books on that subject and again go reverse on a link to get the author. If the author is Newton then go back to the book and return the result. This query is written in Gremlin, a Groovy based domain specific language for graph traversals:

newton.out("DISCOVERED").in("SUBJECT").as("book").in("AUTHOR").filter{it == newton}.back("book").title.unique()

Thus, I hope I've shown a little how a clever traversal can be used to avoid issues with creating intermediate nodes to represent edges. In a small database it won't matter much, but in a large database you're going to suffer large performance hits doing that.

Yes, it is sad that you can't associate edges with other edges in a graph, but that's a limitation of the data structures of these databases. Sometimes it makes sense to make everything a node, for example, in Mediator/CVT a performance has a bit more concreteness too it. Individuals may wish address only Tom Cruise's performance in "The Last Samurai" in a review. However, for most graph databases I've found that application of some graph traversals can get me what I want out of the database.

like image 50
Pridkett Avatar answered Oct 08 '22 14:10

Pridkett