Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using neo4j DISTINCT and ORDER BY on different properties

I am trying to run the following cypher query on neo4j version 1.9M02

START me=node(2)
            MATCH me-[:FOLLOWS]->friends
            ,friends-[fr:ADDED|STOCKS]->products
            ,me-[r?]->products
            WHERE r is null
            RETURN products._id
            ORDER BY fr.CreatedOn DESC
            SKIP 0
            LIMIT 10

I expect to get all products that are ADDED or STOCKED by my friends which are not ADDED or STOCKED by me, ordered by the relation creation time. This query returns correct results but has duplicate products._id values (One user ADDED the product and other user STOCKED it). But I need only one instance of this products._id so I tried

START me=node(2)
            MATCH me-[:FOLLOWS]->friends
            ,friends-[fr:ADDED|STOCKS]->products
            ,me-[r?]->products
            WHERE r is null
            RETURN DISTINCT products._id
            ORDER BY fr.CreatedOn DESC
            SKIP 0
            LIMIT 10

(Added the DISTINCT for products._id) But this time I got Unknown identifier fr error. So I added fr to RETURN statement

START me=node(2)
            MATCH me-[:FOLLOWS]->friends
            ,friends-[fr:ADDED|STOCKS]->products
            ,me-[r?]->products
            WHERE r is null
            RETURN DISTINCT products._id,fr
            ORDER BY fr.CreatedOn DESC
            SKIP 0
            LIMIT 10

This query works without any error but returns duplicate product ids as before.

I've been only playing with neo4j for last couple of days, so not expert at all. I'd really appreciate if someone could help me out here.

like image 722
Optimus Avatar asked Jan 15 '23 16:01

Optimus


2 Answers

I have created a little case with console.neo4j.org

With the following query, I get back [4, 3, 2, 4, 3, 2, 4, 3 ,2]:

START me=node(1) 
MATCH me-[:FOLLOWS]->friends
,friends-[fr:ADDED|STOCKS]->products
,me-[r?]->products 
WHERE r is null 
with distinct products._id as id, fr.CreatedOn as CreatedOn 
ORDER BY CreatedOn DESC 
RETURN id

Putting distinct on the last line, and I get back [3, 4, 2]:

START me=node(1) 
MATCH me-[:FOLLOWS]->friends
,friends-[fr:ADDED|STOCKS]->products
,me-[r?]->products 
WHERE r is null 
with distinct products._id as id, fr.CreatedOn as CreatedOn 
ORDER BY CreatedOn DESC 
RETURN distinct id

It looks like the ordering gets lost with distinct.

EDIT

After filing an issue on github on this, it did dot take long before I got an answer.

The suggestion in the answer was to run this query:

START me=node(1) 
MATCH me-[:FOLLOWS]->friends-[fr:ADDED|STOCKS]->product, 
WHERE not(me-->product)
RETURN product._id as id, min(fr.CreatedOn) as CreatedOn 
ORDER BY CreatedOn DESC 

I modified the query so that it only returns the ids:

START me=node(1) 
MATCH me-[:FOLLOWS]->friends-[fr:ADDED|STOCKS]->product 
WHERE not(me-->product) 
WITH product._id as id, min(fr.CreatedOn) as CreatedOn 
ORDER BY CreatedOn DESC 
RETURN id

And voila, it returns [4, 3, 2]! (I have also created an updated test case.)

like image 128
Werner Kvalem Vesterås Avatar answered Feb 08 '23 13:02

Werner Kvalem Vesterås


By using the aggregate function MIN, you can extract the earliest CreatedOn date. Once you have that, simple order the result set by the earliest date, like this:

START me=node(1) 
MATCH me-[:FOLLOWS]->friends-[fr:ADDED|STOCKS]->product, 
WHERE not(me-->product)
RETURN product._id as id, min(fr.CreatedOn) as CreatedOn 
ORDER BY CreatedOn DESC 

Hth,

Andrés

like image 31
Andres Avatar answered Feb 08 '23 13:02

Andres