I am just getting started with Neo4j and Graph Databases and was wondering if nested hierarchical trees are a good use case for Neo4j. A common example would be a nested set of comments. For example:
- Article
- Comment 1 on Article
- Comment 2 on Comment 1
- Comment 3 on Comment 1
- Comment 4 on Comment 3
- Comment 5 on Article
As I understand it, the article and the comments would each be nodes. And each comment would have a parent-child relationship. Getting all direct comments on the article (1 and 5) would be easy. But how about retrieving the whole set?
Please excuse the use of layman terms. I figured it is better this way then trying to use the appropriate words while confusing everyone.
Well, because trees actually are graphs using a graph database to store a tree seems entirely appropriate to me. What will perform the best will depend on your data access patterns, but basically trees are just a specialization of graphs.
Yes, in your case the "elements in the tree" would be nodes, and the "nesting" would be relationships. So here's how you might mock up your example:
CREATE (root:Article {label: "Article"}),
(c1:Comment {label: "Comment 1"}),
(c1a:Comment {label: "Comment 2 on comment 1"}),
(c1b:Comment {label: "Comment 3 on comment 1"}),
(c1b1:Comment {label: "Comment 4 on comment 3"}),
(c2:Comment {label: "Comment 5 on article"}),
(root)<-[:reply]-(c1),
(c1)<-[:reply]-(c1a),
(c1)<-[:reply]-(c1b),
(c1b)<-[:reply]-(c1b1),
(root)<-[:reply]-(c2);
This just creates a bunch of nodes and relationships, mimicking the structure you provided. Here I've chosen to always use :reply
to connect things.
Now, "getting everything" just means traversing all of the :reply
relationships we created:
MATCH p=(a:Article {label: "Article"})<-[:reply*1..]-(otherThing)
WITH nodes(p) as nodes
RETURN nodes[length(nodes)-2] as InResponseTo,
nodes[length(nodes)-1] as CommentOrReply;
What this query does is traverse any number of :reply
links (starting from the root "Article"). Then it only looks at the nodes in that path, and returns the last item (CommentOrReply
) and what it was in response to (the second last item).
The result looks like this:
+-------------------------------------------------------------------------------------+
| InResponseTo | CommentOrReply |
+-------------------------------------------------------------------------------------+
| Node[18]{label:"Article"} | Node[19]{label:"Comment 1"} |
| Node[19]{label:"Comment 1"} | Node[20]{label:"Comment 2 on comment 1"} |
| Node[19]{label:"Comment 1"} | Node[21]{label:"Comment 3 on comment 1"} |
| Node[21]{label:"Comment 3 on comment 1"} | Node[22]{label:"Comment 4 on comment 3"} |
| Node[18]{label:"Article"} | Node[23]{label:"Comment 5 on article"} |
+-------------------------------------------------------------------------------------+
So that's how you traverse an entire tree.
EDIT - for what it's worth, "variable length path matching", which in the query above was just this bit: <-[:reply*1..]-
is for me one of the top 3 selling points for a graph database. It's precisely that which graph DBs make really easy, that most of your other alternatives like relational make into a tortuously painful exercise. So if you need to do that sort of thing (like here, assembling an entire tree) I would claim that argues for a graph DB because you're using it in its area of fundamental strength.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With