Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

neo4j find all nodes with matching properties

Tags:

neo4j

cypher

I have a relatively large set of nodes, and I want to find all pairs of nodes that have matching property values, but I don't know or care in advance what the property value is. This is basically an attempt to find duplicate nodes, but I can limit the definition of a duplicate to two or more nodes that have the same property value.

Any ideas how to proceed? Not finding any starting points in the neo4j docs. I'm on 1.8.2 community edition.

EDIT
Sorry for not being clear in the initial question, but I'm talking about doing this through Cypher.

like image 427
Paul Avatar asked May 29 '13 15:05

Paul


5 Answers

Cypher to count values on a property, returning a collection of nodes as well:

start n=node(*)
where has(n.prop)
with n.prop as prop, collect(n) as nodelist, count(*) as count
where count > 1
return prop, nodelist, count;

Example on console: http://console.neo4j.org/r/k2s7aa

You can also do an index scan with the property like so (to avoid looking at nodes that don't have this property):
start n=node:node_auto_index('prop:*') ...

2.0 Cypher with a label Label:

match (n:Label)
with n.prop as prop, collect(n) as nodelist, count(*) as count
where count > 1
return prop, nodelist, count;

Update for 3.x: has was replaced by exists.

like image 186
Eve Freeman Avatar answered Oct 12 '22 16:10

Eve Freeman


You can try this one who does which I think does whatever you want.

START n=node(*), m=node(*)
WHERE 
  HAS(n.name) AND HAS (m.name) AND 
  n.name=m.name AND 
  ID(n) <ID(m) 
RETURN n, m

http://console.neo4j.org/?id=xe6wmt

Both nodes should have a name property. name should be equal for both nodes and we only want one pair of the two possibilites which we get via the id comparison. Not sure about performance - please test.

like image 20
p3rnilla Avatar answered Oct 12 '22 18:10

p3rnilla


What about the following approach:

  • use getAllNodes to get an Iterable over all nodes.
  • using getPropertyKeys and getProperty(key) build up a java.util.Map containing all properties for a node. Calculate the map's hashCode()
  • build up a global Map using the hashCode as key and a set of node.getId() as values

This should give you the candidates for being duplicate. Be aware of the hashCode() semantics, there might be nodes with different properties mapping to the same hashCode.

like image 25
Stefan Armbruster Avatar answered Oct 12 '22 18:10

Stefan Armbruster


Neo4j 3.1.1

HAS is no longer supported in Cypher, please use EXISTS instead.

If you want to find nodes with specific property, the Cyper is as follows:

MATCH (n:NodeLabel) where has(n.NodeProperty) return n
like image 32
KAIQI YUAN Avatar answered Oct 12 '22 17:10

KAIQI YUAN


With Neo4j 3.3.4 you can simply do the following:

MATCH (n) where EXISTS(n.propertyName) return n

Simply change propertyName to whatever property you are looking to find.

like image 24
jediwompa Avatar answered Oct 12 '22 17:10

jediwompa