I need to group the data from a neo4j database and then to filter out everything except the top <code>n</code> records of every group. Example: I have two node types : Order and Article. Between them there is an "ADDED" relationship. "ADDED" relationship has a timestamp property. What I want to know (for every article) is how many times it was among the first two articles added to an order. What I tried is the following approach: <ol> <li>get all the Order-[ADDED]-Article</li> <li>sort the result from step 1 by order id as first sorting key and then by timestamp of ADDED relationship as second sorting key;</li> <li>for every subgroup from step 2 representing one order, keep only the top 2 rows;</li> <li>Count distinct article ids in the output of step 3;</li> </ol> My problem is that I got stuck at step 3. Is it possible to get top 2 rows for every subgroup representing an order? Thanks, Tiberiu

Try <pre class="prettyprint"><code>MATCH (o:Order)-[r:ADDED]->(a:Article) WITH o, r, a ORDER BY o.oid, r.t WITH o, COLLECT(a)[..2] AS topArticlesByOrder UNWIND topArticlesByOrder AS a RETURN a.aid AS articleId, COUNT(*) AS count </code></pre> Results look like <pre class="prettyprint"><code>articleId count 8 6 2 2 4 5 7 2 3 3 6 5 0 7 </code></pre> on this sample graph created with <pre class="prettyprint"><code>FOREACH(opar IN RANGE(1,15) | MERGE (o:Order {oid:opar}) FOREACH(apar IN RANGE(1,5) | MERGE (a:Article {aid:TOINT(RAND()*10)}) CREATE o-[:ADDED {t:timestamp() - TOINT(RAND()*1000)}]->a ) ) </code></pre>

Use <code>LIMIT</code> combined with <code>ORDER BY</code> to get the top N of anything. For example, the top 5 scores would be: <pre class="prettyprint"><code>MATCH (node:MyScoreNode) RETURN node ORDER BY node.score DESC LIMIT 5; </code></pre> The <code>ORDER BY</code> part ensures the highest scores show up first. The <code>LIMIT</code> gives you only the first 5, which since they're sorted, are always the highest.

Getting top n records for each group in neo4j

Tags:

neo4j

I need to group the data from a neo4j database and then to filter out everything except the top n records of every group.

Example:

I have two node types : Order and Article. Between them there is an "ADDED" relationship. "ADDED" relationship has a timestamp property. What I want to know (for every article) is how many times it was among the first two articles added to an order. What I tried is the following approach:

get all the Order-[ADDED]-Article
sort the result from step 1 by order id as first sorting key and then by timestamp of ADDED relationship as second sorting key;
for every subgroup from step 2 representing one order, keep only the top 2 rows;
Count distinct article ids in the output of step 3;

My problem is that I got stuck at step 3. Is it possible to get top 2 rows for every subgroup representing an order?

Thanks,

Tiberiu

354

asked Oct 05 '15 14:10

tiberiu

2 Answers

Try

MATCH (o:Order)-[r:ADDED]->(a:Article)
WITH o, r, a
ORDER BY o.oid, r.t
WITH o, COLLECT(a)[..2] AS topArticlesByOrder UNWIND topArticlesByOrder AS a
RETURN a.aid AS articleId, COUNT(*) AS count

Results look like

articleId    count
   8           6
   2           2
   4           5
   7           2
   3           3
   6           5
   0           7

on this sample graph created with

FOREACH(opar IN RANGE(1,15) |
    MERGE (o:Order {oid:opar})
    FOREACH(apar IN RANGE(1,5) |
        MERGE (a:Article {aid:TOINT(RAND()*10)})
        CREATE o-[:ADDED {t:timestamp() - TOINT(RAND()*1000)}]->a
    )
)

134

answered Nov 13 '22 15:11

jjaderberg

Use LIMIT combined with ORDER BY to get the top N of anything. For example, the top 5 scores would be:

MATCH (node:MyScoreNode) 
RETURN node
ORDER BY node.score DESC
LIMIT 5;

The ORDER BY part ensures the highest scores show up first. The LIMIT gives you only the first 5, which since they're sorted, are always the highest.

answered Nov 13 '22 15:11

FrobberOfBits

Related questions
                            
                                What is difference between Titan and Neo4j graph database?
                            
                                How do I connect to a remote Neo4j database using gremlin python?
                            
                                Loading all Neo4J db to RAM
                            
                                What is the best way to store array of strings as node property in Neo4j
                            
                                How to unit test Neo4j in python ?
                            
                                Neo4j OutOfMemory problem
                            
                                Simulating a Markov Chain with Neo4J
                            
                                golang and neo4j using golang-neo4j-bolt-driver
                            
                                Neo4j cypher query order by with collect
                            
                                Architecting a Neo4j-Based Application - stick to vanilla API using plain nodes & relationships or use Spring/GORM?
                            
                                "The client is unauthorized due to authentication failure"
                            
                                How to drop the neo4j embedded database with java?
                            
                                Play 2.1 and Neo4J WrappingNeoServer errors with Logback.xml
                            
                                What is the difference between a Label and a Property in Neo4j?
                            
                                Neo4j : best alternative to storing nested properties?
                            
                                should everything connect with node 0 in neo4j
                            
                                how to get the last node in path in neo4j?
                            
                                Cypher to return total node count as well as a limited set
                            
                                neo4j produces "No authorization header supplied" error
                            
                                Using LOAD CSV to import a local file to Neo4j in a Docker container

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With