Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Neo4j Merge doesn't use unique constraint index

Tags:

neo4j

Neo4j Version 2.2.4

I use LOAD CSV to import a huge collection of nodes and relationships. I use MERGE to get or create the nodes. For performance I also created a unique index for the node property.

CREATE CONSTRAINT ON (e:RESSOURCE) assert e.url is unique;

USING PERIODIC COMMIT 10000
LOAD CSV FROM 'file:///Users/x/data.csv' AS line FIELDTERMINATOR '\t'
MERGE (subject:RESSOURCE {url: trim(line[0])})
MERGE (object:RESSOURCE {url: trim(line[1])})
CREATE (subject)-[:EQUIVALENCE]->(object);

The problem is that the import of about 1Mio. edges performs very bad. I profiled the import and also single MERGE queries and I couldn't see any usage of the unique index. In contrast a MATCH query makes use of the index. What can I do to use MERGE with the index?

like image 300
Kai Schlegel Avatar asked Aug 19 '15 10:08

Kai Schlegel


1 Answers

Peter is correct, for some more explanation:

You run into the EAGER problem, see: http://www.markhneedham.com/blog/2014/10/23/neo4j-cypher-avoiding-the-eager/ you should see it in your EXPLAIN output (remove the periodic commit and use explain)

+--------------+----------------------------------+-----------------------+
| Operator     | Identifiers                      | Other                 |
+--------------+----------------------------------+-----------------------+
| +EmptyResult |                                  |                       |
| |            +----------------------------------+-----------------------+
| +UpdateGraph | anon[179], line, object, subject | CreateRelationship    |
| |            +----------------------------------+-----------------------+
| +UpdateGraph | line, object, subject            | MergeNode; :RESSOURCE |
| |            +----------------------------------+-----------------------+
| +Eager       | line, subject                    |                       |
| |            +----------------------------------+-----------------------+
| +UpdateGraph | line, subject                    | MergeNode; :RESSOURCE |
| |            +----------------------------------+-----------------------+
| +LoadCSV     | line                             |                       |
+--------------+----------------------------------+-----------------------+

Eager will pull in your whole CSV file to ensure isolation and effectively disable your periodic commit.

If you do two passes, you could also try:

CREATE CONSTRAINT ON (e:RESSOURCE) assert e.url is unique;

USING PERIODIC COMMIT 10000
LOAD CSV FROM 'file:///Users/x/data.csv' AS line FIELDTERMINATOR '\t'
FOREACH (url in line[0..1] |
   MERGE (subject:RESSOURCE {url: trim(url)})
);

USING PERIODIC COMMIT 10000
LOAD CSV FROM 'file:///Users/x/data.csv' AS line FIELDTERMINATOR '\t'
MATCH (subject:RESSOURCE {url: trim(line[0])})
MATCH (object:RESSOURCE {url: trim(line[1])})
CREATE (subject)-[:EQUIVALENCE]->(object);
like image 160
Michael Hunger Avatar answered Oct 15 '22 13:10

Michael Hunger