I have the following two node types:
c:City {name: 'blah'}
s:Course {title: 'whatever', city: 'New York'}
Looking to create this:
(s)-[:offered_in]->(c)
I'm trying to get all courses that are NOT tied to cities and create the relationship to the city (city gets created if doesn't exist). However, the issue is that my dataset is about 5 million nodes and any query i make times out (unless i do in increment of 10k).
... anybody has any advice?
EDIT:
Here is a query for jobs i'm running now (that has to be done in 10k chunks (out of millions) because it takes few minutes as it is. creates city if doesn't exist):
match (j:Job)
where not has(j.merged) and has(j.city)
WITH j
LIMIT 10000
MERGE (c:City {name: j.city})
WITH j, c
MERGE (j)-[:in]->(c)
SET j.merged = 1
return count(j)
(for now don't know of a good way to filter out the ones already matched, so trying to do it by tagging it with custom "merged" attribute that i already have an index on)
500000 is a fair few nodes and on your other question you suggested 90% were without the relationship that you want to create here, so it is going to take a bit of time. Without more knowledge of your system (spec, neo setup, programming environment) and when you are running this (on old data or on insert) this is just a best guess at a tidier solution:
MATCH (j:Job)
WHERE NOT (j)-[:IN]->() AND HAS(j.city)
MERGE (c:City {name: j.city})
MERGE (j)-[:IN]->(c)
return count(j)
Obviously you can add your limits back as required.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With