I'm struggling to efficiently bulk update relationship properties in Neo4j. The objective is to update ~ 500,000 relationships (each with roughly 3 properties) which I chunk into batches of 1,000 and processing in a single Cypher statement,
UNWIND {rows} AS row
MATCH (s:Entity) WHERE s.uuid = row.source
MATCH (t:Entity) WHERE t.uuid = row.target
MATCH (s)-[r:CONSUMED]->(t)
SET r += row.properties
however each batch of 1,000 nodes takes around 60 seconds. There exists an index on UUID property for the :Entity
label, i.e. I've previously run,
CREATE INDEX ON :Entity(uuid)
which means that matching the relationship is super efficient per the query plan,
There's 6 total db hits and the query executes in ~ 150 ms. I've also added a uniqueness constraint on the UUID property which ensures that each match only returns one element,
CREATE CONSTRAINT ON (n:Entity) ASSERT n.uuid IS UNIQUE
Does anyone know how I can further debug this to understand why it's taking Neo4j so long to process the relationships?
Note that I'm using similar logic for updating nodes which is orders of magnitude faster which have significant more metadata associated with them.
For reference I'm using Neo4j 3.0.3, py2neo, and Bolt. The Python code block is of the form,
for chunk in chunker(relationships): # 1,000 relationships per chunk
with graph.begin() as tx:
statement = """
UNWIND {rows} AS row
MATCH (s:Entity) WHERE s.uuid = row.source
MATCH (t:Entity) WHERE t.uuid = row.target
MATCH (s)-[r:CONSUMED]->(t)
SET r += row.properties
"""
rows = []
for rel in chunk:
rows.append({
'properties': dict(rel),
'source': rel.start_node()['uuid'],
'target': rel.end_node()['uuid'],
})
tx.run(statement, rows=rows)
Try this query:
UNWIND {rows} AS row
WITH row.source as source, row.target as target, row
MATCH (s:Entity {uuid:source})
USING INDEX s:Entity(uuid)
WITH * WHERE true
MATCH (t:Entity {uuid:target})
USING INDEX t:Entity(uuid)
MATCH (s)-[r:CONSUMED]->(t)
SET r += row.properties;
It uses index hints to force an index lookup for both Entity
nodes and then an Expand(Into)
operator which should be more performant than the Expand(All)
and Filter
operators shown by your query plan.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With