I need to group the data from a neo4j database and then to filter out everything except the top n
records of every group.
Example:
I have two node types : Order and Article. Between them there is an "ADDED" relationship. "ADDED" relationship has a timestamp property. What I want to know (for every article) is how many times it was among the first two articles added to an order. What I tried is the following approach:
get all the Order-[ADDED]-Article
sort the result from step 1 by order id as first sorting key and then by timestamp of ADDED relationship as second sorting key;
for every subgroup from step 2 representing one order, keep only the top 2 rows;
Count distinct article ids in the output of step 3;
My problem is that I got stuck at step 3. Is it possible to get top 2 rows for every subgroup representing an order?
Thanks,
Tiberiu
Here are some solutions to apply a limit to match results per-row Neo4j 4.1 introduced correlated subqueries, letting us perform a subquery using variables present mid-query. Since subqueries execute per row, we can perform the MATCH and apply the LIMIT within the subquery, giving us the easiest means of limiting match results per row.
One common solution is to collect () and take the interested slice: In Neo4j 3.1.x and newer you can use pattern comprehension as a shorthand approach: If only one element in the collection is needed, the head () function can be used to get the first element from the pattern comprehension:
While this works when there are few relationships per node, it may become infeasible on supernodes with larger numbers of relationships, as it must expand all :ACTED_IN relationships before collecting. Neo4j doesn’t currently offer native subquery support aside from pattern comprehension, but even those don’t support LIMIT.
With Neo4j 3.1.3 and higher, and APOC Procedures 3.1.3.6 and higher, you can use use new path expander features to limit expansion to certain nodes. The limit param is only usable with path expander procedures that take a config map, and only when using the end node ( >) or termination label filters ( / ):
Try
MATCH (o:Order)-[r:ADDED]->(a:Article)
WITH o, r, a
ORDER BY o.oid, r.t
WITH o, COLLECT(a)[..2] AS topArticlesByOrder UNWIND topArticlesByOrder AS a
RETURN a.aid AS articleId, COUNT(*) AS count
Results look like
articleId count
8 6
2 2
4 5
7 2
3 3
6 5
0 7
on this sample graph created with
FOREACH(opar IN RANGE(1,15) |
MERGE (o:Order {oid:opar})
FOREACH(apar IN RANGE(1,5) |
MERGE (a:Article {aid:TOINT(RAND()*10)})
CREATE o-[:ADDED {t:timestamp() - TOINT(RAND()*1000)}]->a
)
)
Use LIMIT
combined with ORDER BY
to get the top N of anything. For example, the top 5 scores would be:
MATCH (node:MyScoreNode)
RETURN node
ORDER BY node.score DESC
LIMIT 5;
The ORDER BY
part ensures the highest scores show up first. The LIMIT
gives you only the first 5, which since they're sorted, are always the highest.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With