I am modelling a graph for an application that I am building currently, where I have n Users connected to n Users, I also have n Posts which can be liked by n Users. So the structure would look something like this, for a given user,
if a user likes hundred Post nodes, It would generate 100 edges (realtionships) to the node, when the post is n, the edges will also be n. so one user will be connected to n Users and n posts and n future node types.
So of using an intermediate node thus reducing the edges to the given node, which would look something like this,
Where the users has an intermediate node named Collection, which will be connected to likes, since this is a property graph, I can add a property to the intermediate node and make it behave like the connections are from the user (something like, Likes.username = User.username)
This will similar to this question (Graph database modelling: Should i use a collection node to avoid to many rel on a node)
my thought is
This way of intermediate connecting nodes can isolate junk from the primary node, thus can speed up custom algorithms.
My questions,
This solution will have advantages and disadvantages.
The main disadvantage is that traversal operations will be more expensive, ie. you will have to traverse one more node before finding the posts.
The advantages are following:
About your questions, the best solution is the solution that better fits your workload: if you do a lot of updates on the User, the second solution will give you immediate advantages; if you fetch users alone very often, the second solution will give advantages as well; on the other hand, if your main concern is fast traversal, the second solution won't be a very good fit.
According to the very good book Learning Neo4j (by Rik Van Bruggen, available for download in the Neo4j's web site) your problem is called "Dense node" or "Supernode": nodes that have too many connections.
Still according to the book, supernodes
"becomes a real problem for graph traversals because the graph database management system will have to evaluate all of the connected relationships to that node in order to determine what the next step will be in the graph traversal."
The solution proposed by Rik is very close to your solution (Add an intermediate node): it consists of add a "meta" node between a User and your liked Posts. This meta node should got at most a hundred of connections. If the current meta node reaches 100 connections a new meta node must be created and added to the hierarchy, according to the example of figure, showing a example with popular artists and your fans:
The Neo4j team has been doing a great effort to improve the performance about supernodes, as can be seen in this Github commit (for example) that changes how relationships for a node are stored on the disk, in a linked list structure.
I believe that the best is keep your graph model as simple as possible at first. You don't have a dense node problem (yet) and a premature optimization probably will add some not necessary complexity to your model. If in the future dense nodes becomes a problem you can change your model more accurately. Simplicity is a good choice at first.
You can read a bit more about super nodes in graph database in these links:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With