I am working on describing a tree structured data set in Neo4j. In my current model, a node can have n links to other nodes, making it the child node of those nodes:
root
|
\-- A
|
\-- 1
|
\-- 2
Because I am using nodejs (with node-neo4j), reading of the database is restricted to using Cypher. To read the nodes I use the following query:
START n=node(1)
-- this is the root node
MATCH (n)<-[linkedby:links*]-x, x-[linksto?:links*1]->()
-- get all nodes that link to the root node and all nodes linking to those nodes, etc, etc and retrieve all nodes those found nodes link to
RETURN n, x, linkedby, LAST(linkedby) AS parent, COLLECT(DISTINCT linksto) AS links
-- the last in the path of linkedby is the direct parent
ORDER BY length(linkedby), parent.rank?
-- ordering so that parents are fetched before their children from the result set
My problem: this query becomes very slow (>1 s), as the number of nodes and relationships rise.
Is there a better way of modeling the nodes and relationships? Do I need different relationships between parent nodes and child nodes? Or should I change the query in any way?
Thanks for any pointers!
1) The real world problem: A kind of business process tool, where services are linked to processes, organisations, team, etc to give information which services are required when and by whom and also giving information who will provide this service or is responsible.
So for instance:
Service S is used in Process P1 and P2:
P1 P2
| |
\---+-- S
Service S is managed by Team T:
T
|
\-- S
Team T is part of Organisation O:
O
|
\-- T
Tree:
root
|
|
+-- Processes
| |
| +-- P1
| | |
| | \-- S
| |
| \-- P2
| | |
| | \-- S
|
+-- Organisations
| |
| +-- O
| |
| \-- T
| |
| \-- S
2) My data in console.neo4j.org:
CREATE
(r {name:'root'}),
(ps {name: 'Processes'})-[:links]->r,
(p1 {name: 'P1'})-[:links]->ps,
(p2 {name: 'P2'})-[:links]->ps,
(s {name: 'Service'})-[:links]->p1,
s-[:links]->p2,
(org {name: 'Organisations' })-[:links]->r,
(t {name: 'Team'})-[:links]->org,
s-[:links]->t
So, I had a talk with Michael Hunger the other day and he acknowledged that Cypher isn't good (yet) at processing recursive queries. This should change in 2.1, but until then he proposed I write an unmanaged extension, which I could then call from nodejs.
This completely solved my problem: Retrieving the tree using plain Java is about 10x faster then using Cypher.
Simplified code:
public class EntitiesStream {
public Entity load(long nodeId) {
Node n = database.getNodeById(nodeId);
Entity entity = Entity.from(n);
loadChildren(n, entity);
return entity;
}
private void loadChildren(Node n, Entity e) {
for (Relationship linksFrom: n.getRelationships(Direction.INCOMING, Relationships.links)) {
Node startNode = linksFrom.getStartNode();
Entity childEntity = Entity.from(startNode);
e.addChild(((Number) linksFrom.getProperty("rank")).longValue, childEntity);
this.loadChildren(startNode, childEntity);
}
}
}
public class Entity {
private final TreeMap<Long, Entity> sortedChildren;
Entity(long dbId) {
this.sortedChildren = new TreeMap<>();
// ...
}
public static Entity from(Node node) {
Entity e = new Entity(node.getId());
// ...
return e;
}
public void addChild(Long rank, Entity childEntity) {
sortedChildren.put(rank, childEntity);
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With