Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Neo4J - Storing into relationship vs nodes

I was wondering if there are any advantages or disadvantages in storing data into relationships or nodes.

For example, if I were to store comments related to a discussion into the DB, should I store the comment data in a "comment" relationship or a "comment" node that is related to the discussion through a separate relationship.

like image 856
RazorHead Avatar asked Dec 19 '22 07:12

RazorHead


1 Answers

The correct data model depends on the types of queries you need to make. You should figure out what your queries are, and then determine a data model that meets these criteria:

  1. It allows you to answer all your queries,
  2. It allows your queries to finish sufficiently quickly,
  3. It minimizes the DB storage needed.

In the case of discussion comments, it is likely that you want to query for discussion threads, ordered chronologically. Therefore, you need to store not just the times at which comments are made, but also the relationships between the comments (because a discussion can spawn disjoint threads that do not share the same prior comments).

Let's try a simple test case. Suppose there are 2 disjoint threads spawned by the same initial comment (which we'll call c1): [c1, c3] and [c1, c2, c4]. And suppose, in this simple test case, that we are only interested in querying for all comment threads related to a subject.

If comment properties are stored in nodes, the data might look like:

(u1:User {name: "A"})-[:MADE]->(c1:Comment {time:0, text: "Fee"})-[:ABOUT]->(s1:Subject {title: "Jack"})
(u2:User {name: "B"})-[:MADE]->(c2:Comment {time:1, text: "Fie"})-[:ABOUT]->(c1)
(u3:User {name: "C"})-[:MADE]->(c3:Comment {time:3, text: "Foe"})-[:ABOUT]->(c1)
(u4:User {name: "D"})-[:MADE]->(c4:Comment {time:9, text: "Fum"})-[:ABOUT]->(c2)

If you instead stored the comment properties in relationships, you might try something like the following, but there is a BIG FLAW. There is no way for a relationship to point directly to another relationship (as we try to do in lines 2 to 4). Since this model is not legal in neo4j, it fails to meet any the criteria above.

(u1:User {name: "A"})-[c1:COMMENTED_ABOUT {time:0, text: "Fee"}]->(s1:Subject {title: "Jack"})
(u2:User {name: "B"})-[c2:COMMENTED_ABOUT {time:1, text: "Fie"}]->(c1)
(u3:User {name: "C"})-[c3:COMMENTED_ABOUT {time:3, text: "Foe"}]->(c1)
(u4:User {name: "D"})-[c4:COMMENTED_ABOUT {time:9, text: "Fum"}]->(c2)

Therefore, in our simple test case, it looks like storing the properties in nodes is the only choice.

Here is a query for getting the disjoint thread paths, including the user who made each comment (the WHERE clause filters out partial threads):

MATCH p=(s:Subject)<-[:ABOUT*]-(c:Comment)<-[m:MADE]-(u:User)
WHERE NOT (c)<-[:ABOUT]-()
RETURN p
like image 133
cybersam Avatar answered Dec 25 '22 00:12

cybersam