Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Slow performance bulk updating relationship properties in Neo4j

I'm struggling to efficiently bulk update relationship properties in Neo4j. The objective is to update ~ 500,000 relationships (each with roughly 3 properties) which I chunk into batches of 1,000 and processing in a single Cypher statement,

UNWIND {rows} AS row
MATCH (s:Entity) WHERE s.uuid = row.source
MATCH (t:Entity) WHERE t.uuid = row.target
MATCH (s)-[r:CONSUMED]->(t)
SET r += row.properties

however each batch of 1,000 nodes takes around 60 seconds. There exists an index on UUID property for the :Entity label, i.e. I've previously run,

CREATE INDEX ON :Entity(uuid)

which means that matching the relationship is super efficient per the query plan,

enter image description here

There's 6 total db hits and the query executes in ~ 150 ms. I've also added a uniqueness constraint on the UUID property which ensures that each match only returns one element,

CREATE CONSTRAINT ON (n:Entity) ASSERT n.uuid IS UNIQUE

Does anyone know how I can further debug this to understand why it's taking Neo4j so long to process the relationships?

Note that I'm using similar logic for updating nodes which is orders of magnitude faster which have significant more metadata associated with them.

For reference I'm using Neo4j 3.0.3, py2neo, and Bolt. The Python code block is of the form,

for chunk in chunker(relationships): # 1,000 relationships per chunk
    with graph.begin() as tx:
        statement = """
            UNWIND {rows} AS row
            MATCH (s:Entity) WHERE s.uuid = row.source
            MATCH (t:Entity) WHERE t.uuid = row.target
            MATCH (s)-[r:CONSUMED]->(t)
            SET r += row.properties
            """

            rows = []

            for rel in chunk:
                rows.append({
                    'properties': dict(rel),
                    'source': rel.start_node()['uuid'],
                    'target': rel.end_node()['uuid'],
                })

            tx.run(statement, rows=rows)
like image 722
John Avatar asked Oct 30 '22 16:10

John


1 Answers

Try this query:

UNWIND {rows} AS row
WITH row.source as source, row.target as target, row
MATCH (s:Entity {uuid:source})
USING INDEX s:Entity(uuid)
WITH * WHERE true
MATCH (t:Entity {uuid:target})
USING INDEX t:Entity(uuid)
MATCH (s)-[r:CONSUMED]->(t)
SET r += row.properties;

It uses index hints to force an index lookup for both Entity nodes and then an Expand(Into) operator which should be more performant than the Expand(All) and Filter operators shown by your query plan.

like image 91
William Lyon Avatar answered Nov 18 '22 16:11

William Lyon