Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Neo4j - Duplicates Despite Using Merge

I am attempting to generate a database using MERGE statements through Neo4JPHP. All of my queries are using MERGE; however, it is generating separate nodes every time, resulting in massive duplication.

The queries are run within a single transaction. I've removed the surrounding code to focus on the queries:

$transaction = $client->beginTransaction();

while(...) {
    $pq = new Query($client, 'MERGE (n:Page {url:"'.$page.'"}) SET n.title="'.$title.'"');
    $transaction->addStatements(array($pageQuery));

    $h1Query = new Query($client, 'MATCH (n:Page {url:"'.$page.'"}) SET n.h1s = "['.implode(", ", $h1s).']"');
    $transaction->addStatements(array($h1Query));

    $scriptQuery = new Query($client, 'MATCH (n:Page {url:"'.$page.'"}) MERGE (n)-[:CONTAINS_SCRIPT]->(s:Script {url:"'.$s.'"})');
    $transaction->addStatements(array($scriptQuery));

    $styleQuery = new Query($client, 'MATCH (n:Page {url:"'.$page.'"}) MERGE (n)-[:CONTAINS_STYLESHEET]->(s:StyleSheet {url:"'.$s.'"})');
    $transaction->addStatements(array($styleQuery));

    $otherPageQuery = new Query($client, 'MATCH (n:Page {url:"'.$page.'"}) MERGE (n)-[:LINKS_TO]->(m:Page {url:"'.$match.'"})');
    $transaction->addStatements(array($otherPageQuery));
}

$transaction->commit();

Now, after running this across a couple of pages, it comes up with 6 copies of the same Pages, one with title and h1s elements, and the rest without.

I also tried using CREATE UNIQUE, but this gave an error that the syntax wasn't supported.

I am running Neo4j 2.0.1. Any suggestions?

like image 840
randak Avatar asked Dec 25 '22 11:12

randak


1 Answers

When you use MERGE on matches with relationships in Cypher, the entire object is being matched or created. When a match cannot be found, the entire object is created.

For example:

MERGE (n:Page { url: "http://www.neo4j.org" })
RETURN n

Gets or creates the Page with the property url set to http://www.neo4j.org. This statement will never create a duplicate node.

Now let's assume that this node now exists within the Neo4j database and then we run the following query:

MERGE (n:Page { url: "http://www.neo4j.org" })-[:CONNECTED_TO]->(test:Test { id: "test" })
RETURN *

This will attempt to match the entire pattern and if it does not exist, it will create the entire path regardless of whether or not the Page node exists.

To resolve your issue, make sure that you use MERGE to get or create your individual nodes first. Then you can use MERGE to get or create the relationship between the two nodes.

Example:

MERGE (n:Page { url: "http://www.neo4j.org" })
MERGE (s:StyleSheet { url: "http://www.neo4j.org/styles/main.css" })
MERGE (n)-[:CONTAINS_STYLESHEET]->(s)
RETURN *
like image 124
Kenny Bastani Avatar answered Jan 02 '23 07:01

Kenny Bastani