Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting top n records for each group in neo4j

Tags:

neo4j

I need to group the data from a neo4j database and then to filter out everything except the top n records of every group.

Example:

I have two node types : Order and Article. Between them there is an "ADDED" relationship. "ADDED" relationship has a timestamp property. What I want to know (for every article) is how many times it was among the first two articles added to an order. What I tried is the following approach:

  1. get all the Order-[ADDED]-Article

  2. sort the result from step 1 by order id as first sorting key and then by timestamp of ADDED relationship as second sorting key;

  3. for every subgroup from step 2 representing one order, keep only the top 2 rows;

  4. Count distinct article ids in the output of step 3;

My problem is that I got stuck at step 3. Is it possible to get top 2 rows for every subgroup representing an order?

Thanks,

Tiberiu

like image 354
tiberiu Avatar asked Oct 05 '15 14:10

tiberiu


People also ask

How to limit match results per row in Neo4j?

Here are some solutions to apply a limit to match results per-row Neo4j 4.1 introduced correlated subqueries, letting us perform a subquery using variables present mid-query. Since subqueries execute per row, we can perform the MATCH and apply the LIMIT within the subquery, giving us the easiest means of limiting match results per row.

How to get the first element from a collection In Neo4j?

One common solution is to collect () and take the interested slice: In Neo4j 3.1.x and newer you can use pattern comprehension as a shorthand approach: If only one element in the collection is needed, the head () function can be used to get the first element from the pattern comprehension:

Does Neo4j support subqueries?

While this works when there are few relationships per node, it may become infeasible on supernodes with larger numbers of relationships, as it must expand all :ACTED_IN relationships before collecting. Neo4j doesn’t currently offer native subquery support aside from pattern comprehension, but even those don’t support LIMIT.

How to limit expansion to certain nodes in Neo4j?

With Neo4j 3.1.3 and higher, and APOC Procedures 3.1.3.6 and higher, you can use use new path expander features to limit expansion to certain nodes. The limit param is only usable with path expander procedures that take a config map, and only when using the end node ( >) or termination label filters ( / ):


2 Answers

Try

MATCH (o:Order)-[r:ADDED]->(a:Article)
WITH o, r, a
ORDER BY o.oid, r.t
WITH o, COLLECT(a)[..2] AS topArticlesByOrder UNWIND topArticlesByOrder AS a
RETURN a.aid AS articleId, COUNT(*) AS count

Results look like

articleId    count
   8           6
   2           2
   4           5
   7           2
   3           3
   6           5
   0           7

on this sample graph created with

FOREACH(opar IN RANGE(1,15) |
    MERGE (o:Order {oid:opar})
    FOREACH(apar IN RANGE(1,5) |
        MERGE (a:Article {aid:TOINT(RAND()*10)})
        CREATE o-[:ADDED {t:timestamp() - TOINT(RAND()*1000)}]->a
    )
)
like image 134
jjaderberg Avatar answered Nov 13 '22 15:11

jjaderberg


Use LIMIT combined with ORDER BY to get the top N of anything. For example, the top 5 scores would be:

MATCH (node:MyScoreNode) 
RETURN node
ORDER BY node.score DESC
LIMIT 5;

The ORDER BY part ensures the highest scores show up first. The LIMIT gives you only the first 5, which since they're sorted, are always the highest.

like image 25
FrobberOfBits Avatar answered Nov 13 '22 15:11

FrobberOfBits