Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

neo4j: Replace multiple nodes with same property by one node

Let's say I have a property "name" of nodes in neo4j. Now I want to enforce that there is maximally one node for a given name by identifying all nodes with the same name. More precisely: If there are three nodes where name is "dog", I want them to be replaced by just one node with name "dog", which:

  1. Gathers all properties from all the original three nodes.
  2. Has all arcs that were attached to the original three nodes.

The background for this is the following: In my graph, there are often several nodes of the same name which should considered as "equal" (although some have richer property information than others). Putting a.name = b.name in a WHERE clause is extremely slow.

EDIT: I forgot to mention that my Neo4j is of version 2.3.7 currently (I cannot update it).

SECOND EDIT: There is a known list of labels for the nodes and for the possible arcs. The type of the nodes is known.

THIRD EDIT: I want to call above "node collapse" procedure from Java, so a mixture of Cypher queries and procedural code would also be a useful solution.

like image 241
J Fabian Meier Avatar asked Aug 10 '16 11:08

J Fabian Meier


2 Answers

I have made a testcase with following schema:

CREATE (n1:TestX {name:'A', val1:1})
CREATE (n2:TestX {name:'B', val2:2})
CREATE (n3:TestX {name:'B', val3:3})
CREATE (n4:TestX {name:'B', val4:4})
CREATE (n5:TestX {name:'C', val5:5})

MATCH (n6:TestX {name:'A', val1:1}) MATCH (m7:TestX {name:'B', val2:2}) CREATE (n6)-[:TEST]->(m7)
MATCH (n8:TestX {name:'C', val5:5}) MATCH (m10:TestX {name:'B', val3:3}) CREATE (n8)<-[:TEST]-(m10)

What results in following output:

enter image description here

Where the nodes B are really the same nodes. And here is my solution:

//copy all properties
MATCH (n:TestX), (m:TestX) WHERE n.name = m.name AND ID(n)<ID(m) WITH n, m SET n += m;

//copy all outgoing relations
MATCH (n:TestX), (m:TestX)-[r:TEST]->(endnode) WHERE n.name = m.name AND ID(n)<ID(m) WITH n, collect(endnode) as endnodes
FOREACH (x in endnodes | CREATE (n)-[:TEST]->(x));

//copy all incoming relations
MATCH (n:TestX), (m:TestX)<-[r:TEST]-(endnode) WHERE n.name = m.name AND ID(n)<ID(m) WITH n, collect(endnode) as endnodes
FOREACH (x in endnodes | CREATE (n)<-[:TEST]-(x));

//delete duplicates
MATCH (n:TestX), (m:TestX) WHERE n.name = m.name AND ID(n)<ID(m) detach delete m;

The resulting output looks like this:

enter image description here

It has to be marked that you have to know the type of the various relationships.

All the properties are copied from the nodes with "higher" IDs to the nodes with the "lower" IDs.

like image 105
K.E. Avatar answered Oct 18 '22 08:10

K.E.


I think you need something like a synonym of nodes.

1) Go through all nodes and create a node synonym:

MATCH (N)
WITH N
  MERGE (S:Synonym {name: N.name})
  MERGE (S)<-[:hasSynonym]-(N)
RETURN count(S);

2) Remove the synonyms with only one node:

MATCH (S:Synonym)
WITH S
MATCH (S)<-[:hasSynonym]-(N)
WITH S, count(N) as count
WITH S WHERE count = 1
DETACH DELETE S;

3) Transport properties and relationships for the remaining synonyms (with apoc):

MATCH (S:Synonym)
WITH S
MATCH (S)<-[:hasSynonym]-(N)
WITH [S] + collect(N) as nodesForMerge
CALL apoc.refactor.mergeNodes( nodesForMerge );

4) Remove Synonym label:

MATCH (S:Synonym)<-[:hasSynonym]-(N)
CALL apoc.create.removeLabels( [S], ['Synonym'] );
like image 35
stdob-- Avatar answered Oct 18 '22 09:10

stdob--