Let's say I have a property "name" of nodes in neo4j. Now I want to enforce that there is maximally one node for a given name by identifying all nodes with the same name. More precisely: If there are three nodes where name is "dog", I want them to be replaced by just one node with name "dog", which:
The background for this is the following: In my graph, there are often several nodes of the same name which should considered as "equal" (although some have richer property information than others). Putting a.name = b.name
in a WHERE clause is extremely slow.
EDIT: I forgot to mention that my Neo4j is of version 2.3.7 currently (I cannot update it).
SECOND EDIT: There is a known list of labels for the nodes and for the possible arcs. The type of the nodes is known.
THIRD EDIT: I want to call above "node collapse" procedure from Java, so a mixture of Cypher queries and procedural code would also be a useful solution.
I have made a testcase with following schema:
CREATE (n1:TestX {name:'A', val1:1})
CREATE (n2:TestX {name:'B', val2:2})
CREATE (n3:TestX {name:'B', val3:3})
CREATE (n4:TestX {name:'B', val4:4})
CREATE (n5:TestX {name:'C', val5:5})
MATCH (n6:TestX {name:'A', val1:1}) MATCH (m7:TestX {name:'B', val2:2}) CREATE (n6)-[:TEST]->(m7)
MATCH (n8:TestX {name:'C', val5:5}) MATCH (m10:TestX {name:'B', val3:3}) CREATE (n8)<-[:TEST]-(m10)
What results in following output:
Where the nodes B are really the same nodes. And here is my solution:
//copy all properties
MATCH (n:TestX), (m:TestX) WHERE n.name = m.name AND ID(n)<ID(m) WITH n, m SET n += m;
//copy all outgoing relations
MATCH (n:TestX), (m:TestX)-[r:TEST]->(endnode) WHERE n.name = m.name AND ID(n)<ID(m) WITH n, collect(endnode) as endnodes
FOREACH (x in endnodes | CREATE (n)-[:TEST]->(x));
//copy all incoming relations
MATCH (n:TestX), (m:TestX)<-[r:TEST]-(endnode) WHERE n.name = m.name AND ID(n)<ID(m) WITH n, collect(endnode) as endnodes
FOREACH (x in endnodes | CREATE (n)<-[:TEST]-(x));
//delete duplicates
MATCH (n:TestX), (m:TestX) WHERE n.name = m.name AND ID(n)<ID(m) detach delete m;
The resulting output looks like this:
It has to be marked that you have to know the type of the various relationships.
All the properties are copied from the nodes with "higher" IDs to the nodes with the "lower" IDs.
I think you need something like a synonym of nodes.
1) Go through all nodes and create a node synonym:
MATCH (N)
WITH N
MERGE (S:Synonym {name: N.name})
MERGE (S)<-[:hasSynonym]-(N)
RETURN count(S);
2) Remove the synonyms with only one node:
MATCH (S:Synonym)
WITH S
MATCH (S)<-[:hasSynonym]-(N)
WITH S, count(N) as count
WITH S WHERE count = 1
DETACH DELETE S;
3) Transport properties and relationships for the remaining synonyms (with apoc
):
MATCH (S:Synonym)
WITH S
MATCH (S)<-[:hasSynonym]-(N)
WITH [S] + collect(N) as nodesForMerge
CALL apoc.refactor.mergeNodes( nodesForMerge );
4) Remove Synonym
label:
MATCH (S:Synonym)<-[:hasSynonym]-(N)
CALL apoc.create.removeLabels( [S], ['Synonym'] );
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With