I have a simple model of a chess tournament. It has 5 players playing each other. The graph looks like this:
The graph is generally fine, but upon further inspection, you can see that both sets
Guy1 vs Guy2,
and
Guy4 vs Guy5
have a redundant relationship each.
The problem is obviously in the data, where there is a extraneous complementary row for each of these matches (so in a sense this is a data quality issue in the underlying csv):
I could clean these rows by hand, but the real dataset has millions of rows. So I'm wondering how I could remove these relationships in either of 2 ways, using CQL:
1) Don't read in the extra relationship in the first place
2) Go ahead and create the extra relationship, but then remove it later.
Thanks in advance for any advice on this.
The code I'm using is this:
/ Here, we load and create nodes
LOAD CSV WITH HEADERS FROM
'file:///.../chess_nodes.csv' AS line
WITH line
MERGE (p:Player {
player_id: line.player_id
})
ON CREATE SET p.name = line.name
ON MATCH SET p.name = line.name
ON CREATE SET p.residence = line.residence
ON MATCH SET p.residence = line.residence
// Here create the edges
LOAD CSV WITH HEADERS FROM
'file:///.../chess_edges.csv' AS line
WITH line
MATCH (p1:Player {player_id: line.player1_id})
WITH p1, line
OPTIONAL MATCH (p2:Player {player_id: line.player2_id})
WITH p1, p2, line
MERGE (p1)-[:VERSUS]->(p2)
It is obvious that you don't need this extra relationship as it doesn't add any value nor weight to the graph.
There is something that few people are aware of, despite being in the documentation.
MERGE
can be used on undirected
relationships, neo4j will pick one direction for you (as realtionships MUST be directed in the graph).
Documentation reference : http://neo4j.com/docs/stable/query-merge.html#merge-merge-on-an-undirected-relationship
An example with the following statement, if you run it for the first time :
MATCH (a:User {name:'A'}), (b:User {name:'B'})
MERGE (a)-[:VERSUS]-(b)
It will create the relationship as it doesn't exist. However if you run it a second time, nothing will be changed nor created.
I guess it would solve your problem as you will not have to worry about cleaning the data in upfront nor run scripts afterwards for cleaning your graph.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With