I have a relatively large set of nodes, and I want to find all pairs of nodes that have matching property values, but I don't know or care in advance what the property value is. This is basically an attempt to find duplicate nodes, but I can limit the definition of a duplicate to two or more nodes that have the same property value.
Any ideas how to proceed? Not finding any starting points in the neo4j docs. I'm on 1.8.2 community edition.
EDIT
Sorry for not being clear in the initial question, but I'm talking about doing this through Cypher.
Cypher to count values on a property, returning a collection of nodes as well:
start n=node(*)
where has(n.prop)
with n.prop as prop, collect(n) as nodelist, count(*) as count
where count > 1
return prop, nodelist, count;
Example on console: http://console.neo4j.org/r/k2s7aa
You can also do an index scan with the property like so (to avoid looking at nodes that don't have this property):start n=node:node_auto_index('prop:*') ...
2.0 Cypher with a label Label:
match (n:Label)
with n.prop as prop, collect(n) as nodelist, count(*) as count
where count > 1
return prop, nodelist, count;
Update for 3.x: has
was replaced by exists
.
You can try this one who does which I think does whatever you want.
START n=node(*), m=node(*)
WHERE
HAS(n.name) AND HAS (m.name) AND
n.name=m.name AND
ID(n) <ID(m)
RETURN n, m
http://console.neo4j.org/?id=xe6wmt
Both nodes should have a name
property. name
should be equal for both nodes and we only want one pair of the two possibilites which we get via the id comparison. Not sure about performance - please test.
What about the following approach:
java.util.Map
containing all properties for a node. Calculate the map's hashCode()
Map
using the hashCode as key and a set of node.getId()
as valuesThis should give you the candidates for being duplicate. Be aware of the hashCode() semantics, there might be nodes with different properties mapping to the same hashCode.
Neo4j 3.1.1
HAS is no longer supported in Cypher, please use EXISTS instead.
If you want to find nodes with specific property, the Cyper is as follows:
MATCH (n:NodeLabel) where has(n.NodeProperty) return n
With Neo4j 3.3.4 you can simply do the following:
MATCH (n) where EXISTS(n.propertyName) return n
Simply change propertyName
to whatever property you are looking to find.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With