Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Neo4j: Query to find the nodes with most relationships, and their connected nodes

I am using Neo4j CE 3.1.1 and I have a relationship WRITES between authors and books. I want to find the N (say N=10 for example) books with the largest number of authors. Following some examples I found, I came up with the query:

MATCH (a)-[r:WRITES]->(b)
RETURN r,
COUNT(r) ORDER BY COUNT(r) DESC LIMIT 10

When I execute this query in the Neo4j browser I get 10 books, but these do not look like the ones written by most authors, as they show only a few WRITES relationships to authors. If I change the query to

MATCH (a)-[r:WRITES]->(b)
RETURN b,
COUNT(r) ORDER BY COUNT(r) DESC LIMIT 10

Then I get the 10 books with the most authors, but I don't see their relationship to authors. To do so, I have to write additional queries explicitly stating the name of a book I found in the previous query:

MATCH ()-[r:WRITES]->(b)
WHERE b.title="Title of a book with many authors"
RETURN r

What am I doing wrong? Why isn't the first query working as expected?

like image 421
st1led Avatar asked Feb 14 '17 23:02

st1led


People also ask

How do I return all nodes and relationships in Neo4j?

Return all elements When you want to return all nodes, relationships and paths found in a query, you can use the * symbol.

How many nodes can a single relationship connect in Neo4j?

This will work in all versions of Neo4j that support the MATCH clause, namely 2.0. 0 and later. This is a minimum length of 3, and a maximum of 5. It describes a graph of either 4 nodes and 3 relationships, 5 nodes and 4 relationships or 6 nodes and 5 relationships, all connected together in a single path.

How many relationships can Neo4j handle?

The standard store format of neo4j allows for 65k different relationship types.

How can I see all nodes in Neo4j?

You can show everything with simple MATCH (n) RETURN n , as offical documentation suggests. START n=node(*) RETURN n from Neo4j 2.0 is deprecated: The START clause should only be used when accessing legacy indexes (see Chapter 34, Legacy Indexing). In all other cases, use MATCH instead (see Section 10.1, “Match”).


2 Answers

Aggregations only have context based on the non-aggregation columns, and with your match, a unique relationship will only occur once in your results.

So your first query is asking for each relationship on a row, and the count of that particular relationship, which is 1.

You might rewrite this in a couple different ways.

One is to collect the authors and order on the size of the author list:

MATCH (a)-[:WRITES]->(b)
RETURN b, COLLECT(a) as authors
ORDER BY SIZE(authors) DESC LIMIT 10

You can always collect the author and its relationship, if the relationship itself is interesting to you.

EDIT

If you happen to have labels on your nodes (you absolutely SHOULD have labels on your nodes), you can try a different approach by matching to all books, getting the size of the incoming :WRITES relationships to each book, ordering and limiting on that, and then performing the match to the authors:

MATCH (b:Book)
WITH b, SIZE(()-[:WRITES]->(b)) as authorCnt
ORDER BY authorCnt DESC LIMIT 10
MATCH (a)-[:WRITES]->(b)
RETURN b, a

You can collect on the authors and/or return the relationship as well, depending on what you need from the output.

like image 103
InverseFalcon Avatar answered Sep 28 '22 00:09

InverseFalcon


You are very close: after sorting, it is necessary to rediscover the authors. For example:

MATCH (a:Author)-[r:WRITES]->(b:Book)
WITH b, 
     COUNT(r) AS authorsCount
     ORDER BY authorsCount DESC LIMIT 10
MATCH (b)<-[:WRITES]-(a:Author)
RETURN b, 
       COLLECT(a) AS authors
       ORDER BY size(authors) DESC
like image 40
stdob-- Avatar answered Sep 28 '22 02:09

stdob--