Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

neo4j - how to match only first n relations

Tags:

neo4j

cypher

is there a default way how to match only first n relationships except that filtering on LIMIT n later?

i have this query:

START n=node({id})
MATCH n--u--n2
RETURN u, count(*) as cnt order by cnt desc limit 10;

but assuming the number of n--u relationships is very high, i want to relax this query and took for example first 100 random relationships and than continue with u--n2...

this is for a collaborative filtering task, and assuming the users are more-less similar i dont want to match all users u but a random subset. this approach should be faster in performance - now i got ~500ms query time but would like to drop it under 50ms.

i know i could break the above query into 2 separate ones, but still in the first query it goes through all users and than later it limits the output. i want to limit the max rels during match phase.

like image 958
ulkas Avatar asked Apr 25 '13 13:04

ulkas


People also ask

What is optional match in Neo4j?

An OPTIONAL MATCH matches patterns against your graph database, just like a MATCH does. The difference is that if no matches are found, OPTIONAL MATCH will use a null for missing parts of the pattern. OPTIONAL MATCH could be considered the Cypher equivalent of the outer join in SQL.

What does match do in Neo4j?

The MATCH clause allows you to specify the patterns Neo4j will search for in the database. This is the primary way of getting data into the current set of bindings. It is worth reading up more on the specification of the patterns themselves in Patterns.

What is coalesce in Neo4j?

The function coalesce() returns the first non- null value in the given list of expressions. Syntax: coalesce(expression [, expression]*) Returns: The type of the value returned will be that of the first non- null expression.

What is unwind in Neo4j?

With UNWIND , you can transform any list back into individual rows. These lists can be parameters that were passed in, previously collect -ed result or other list expressions. One common usage of unwind is to create distinct lists. Another is to create data from parameter lists that are provided to the query.


2 Answers

You can pipe the current results of your query using WITH, then LIMIT those initial results, and then continue on in the same query:

START n=node({id})
MATCH n--u
WITH u
LIMIT 10
MATCH u--n2
RETURN u, count(*) as cnt 
ORDER BY cnt desc 
LIMIT 10;

The query above will give you the first 10 us found, and then continue to find the first ten matching n2s.

Optionally, you can leave off the second LIMIT and you will get all matching n2s for the first ten us (meaning you could have more than ten rows returned if they matched the first 10 us).

like image 144
ean5533 Avatar answered Oct 21 '22 23:10

ean5533


This is not a direct solution to your question, but since I was running into a similar problem, my work-around might be interesting for you.

What I need to do is: get relationships by index (might yield many thousands) and get the start node of these. Since the start node is always the same with that index-query, I only need the very first relationship's startnode.

Since I wasn't able to achieve that with cypher (the proposed query by ean5533 does not perform any better), I am using a simple unmanaged extension (nice template).

@GET
@Path("/address/{address}")
public Response getUniqueIDofSenderAddress(@PathParam("address") String addr, @Context GraphDatabaseService graphDB) throws IOException
{
    try {
        RelationshipIndex index = graphDB.index().forRelationships("transactions");
        IndexHits<Relationship> rels = index.get("sender_address", addr);

        int unique_id = -1;
        for (Relationship rel : rels) {
            Node sender = rel.getStartNode();
            unique_id = (Integer) sender.getProperty("unique_id");
            rels.close();
            break;
        }

        return Response.ok().entity("Unique ID: " + unique_id).build();
    } catch (Exception e) {
        return Response.serverError().entity("Could not get unique ID.").build();
    }
}

For this case here, the speed up is quite nice.

I don't know your exact use case, but since Neo4j even supports HTTP streaming afaik, you should be able to create to convert your query to an unmanaged extension and still get the full performance. E.g., "java-querying" all your qualifying nodes and emit the partial result to the HTTP stream.

like image 24
Bouncner Avatar answered Oct 21 '22 23:10

Bouncner