Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Neo4j: MERGE creates duplicate nodes

My database model has users and MAC addresses. A user can have multiple MAC addresses, but a MAC can only belong to one user. If some user sets his MAC and that MAC is already linked to another user, the existing relationship is removed and a new relationship is created between the new owner and that MAC. In other words, a MAC moves between users.

This is a particular instance of the Cypher query I'm using to assign MAC addresses:

MATCH (new:User { Id: 2 })
MERGE (mac:MacAddress { Value: "D857EFEF1CF6" })
WITH new, mac
OPTIONAL MATCH ()-[oldr:MAC_ADDRESS]->(mac)
DELETE oldr
MERGE (new)-[:MAC_ADDRESS]->(mac)

The query runs fine in my tests, but in production, for some strange reason it sometimes creates duplicate MacAddress nodes (and a new relationship between the user and each of those nodes). That is, a particular user can have multiple MacAddress nodes with the same Value.

I can tell they are different nodes because they have different node ID's. I'm also sure the Values are exactly the same because I can do a collect(distinct mac.Value) on them and the result is a collection with one element. The query above is the only one in the code that creates MacAddress nodes.

I'm using Neo4j 2.1.2. What's going on here?

Thanks, Jan

like image 536
Jan Van den bosch Avatar asked Sep 25 '14 19:09

Jan Van den bosch


People also ask

How does merge work in Neo4j?

What is MERGE, and how does it work? The MERGE clause ensures that a pattern exists in the graph. Either the entire pattern already exists, or the entire pattern needs to be created. In this way, it's helpful to think of MERGE as attempting a MATCH on the pattern, and if no match is found, a CREATE of the pattern.

How do I merge two nodes in Neo4j?

You can merge a node in the database based on the label using the MERGE clause. If you try to merge a node based on the label, then Neo4j verifies whether there exists any node with the given label. If not, the current node will be created.

What is unwind in Neo4j?

With UNWIND , you can transform any list back into individual rows. These lists can be parameters that were passed in, previously collect -ed result or other list expressions. One common usage of unwind is to create distinct lists. Another is to create data from parameter lists that are provided to the query.

What is coalesce in Neo4j?

The function coalesce() returns the first non- null value in the given list of expressions. Syntax: coalesce(expression [, expression]*) Returns: The type of the value returned will be that of the first non- null expression.


1 Answers

Are you sure this is the entirety of the queries you're running? MERGE has this really common pitfall where it merges everything that you give it. So here's what people expect:

neo4j-sh (?)$ MERGE (mac:MacAddress { Value: "D857EFEF1CF6" });
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 1
Properties set: 1
Labels added: 1
1650 ms
neo4j-sh (?)$ MERGE (mac:MacAddress { Value: "D857EFEF1CF6" });
+--------------------------------------------+
| No data returned, and nothing was changed. |
+--------------------------------------------+
17 ms
neo4j-sh (?)$ match (mac:MacAddress { Value: "D857EFEF1CF6" }) return count(mac);
+------------+
| count(mac) |
+------------+
| 1          |
+------------+
1 row
200 ms

So far, so good. That's what we expect. Now watch this:

neo4j-sh (?)$ MERGE (mac:MacAddress { Value: "D857EFEF1CF6" })-[r:foo]->(b:SomeNode {label: "Foo!"});
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 2
Relationships created: 1
Properties set: 2
Labels added: 2
178 ms
neo4j-sh (?)$ match (mac:MacAddress { Value: "D857EFEF1CF6" }) return count(mac);                    
+------------+
| count(mac) |
+------------+
| 2          |
+------------+
1 row
2 ms

Wait, WTF happened here? We specified only the same MAC address again, why is a duplicate created?

The documentation on MERGE specifies that "MERGE will not partially use existing patterns — it’s all or nothing. If partial matches are needed, this can be accomplished by splitting a pattern up into multiple MERGE clauses". So because when we run this path MERGE the whole path doesn't already exist, it creates everything in it, including a duplicate mac address node.

There are frequently questions about duplicated nodes created by MERGE, and 99 times out of 100, this is what's going on.

like image 51
FrobberOfBits Avatar answered Oct 01 '22 20:10

FrobberOfBits