Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cypher count multiple nodes in pattern

Tags:

neo4j

cypher

I have a problem with a Cypher query on my Neo4j instance.

I have the following graph Structure:

(d:Document)-->(t:Token)-->(l:Lemma)

A Document can have outgoing relationships to many Tokens, whereas a Token has always exactly one incoming relationship from a Document. A Token always has exactly one outgoing relationship to a Lemma, whereas a Lemma can have multiple incoming relationships from Tokens.

So the cardinalities are [Document]-n-1-[Token]-1-m-[Lemma].

I want, for each Document in a given list documentIds, count the number of distinct Tokens and Lemmata in this pattern and devide the latter by the former. This should factor in that each Lemma can be connected to multiple Tokens in the pattern and these Lemmata should not be counted multiple times.

My query so far looks like this:

MATCH (d:DOCUMENT)--(t:TOKEN)--(l:LEMMA)
WHERE d.id in {documentIds}
WITH d, count(DISTINCT l)/count(DISTINCT t) AS ttr
RETURN d.id AS id, ttr

I have the feeling that this counts the Lemmata and Tokens across documents, instead of counting for each document separately. Also in my result ttr is 0.0 for each d.id.

I don't know if there is a way for me to provide you my database content. Is there some obvious mistake in the query?

EDIT:
I create a console. http://console.neo4j.org/r/yqtrbx

In this case there are two Documents whose Tokens share one Lemma in common. For this graph I want the result to be 2/3 for the document with id 10023 and 2/2 for the document with id 10050. In a full document the difference between the Token count and the Lemma count is usually much higher.

like image 730
Hannes Avatar asked May 28 '26 02:05

Hannes


1 Answers

You are facing with a issue related to the fact you are dividing two integer numbers and getting an integer as result. This way the division 2/3 will result in zero and not the expected 0.66. To fix this issue simply cast one of the integers to float, this way:

match (d:DOCUMENT)-->(t:TOKEN)-->(l:LEMMA)
with d, count(distinct l) as cl, count(distinct t) as ct
return d, cl, ct, cl / toFloat(ct)

The result will be (based on your data set):

╒════════════╤════╤════╤══════════════════╕
│"d"         │"cl"│"ct"│"cl / toFloat(ct)"│
╞════════════╪════╪════╪══════════════════╡
│{"id":10050}│2   │2   │1                 │
├────────────┼────┼────┼──────────────────┤
│{"id":10023}│2   │3   │0.6666666666666666│
└────────────┴────┴────┴──────────────────┘
like image 93
Bruno Peres Avatar answered May 31 '26 08:05

Bruno Peres