Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove duplicates from Node array properties

Tags:

neo4j

cypher

I have a property A on my nodes that holds an array of string values:

n.A=["ABC","XYZ","123","ABC"]

During merges I frequently will write code similar to n.A = n.A + "New Value". The problem I've been running into is that I end up with duplicate values in my arrays; not insurmountable but I'd like to avoid it.

  • How can I write a cypher query that will remove all duplicate values from the array A? Some duplicates have already been inserted at this point, and I'd like to clean them up.
  • When adding new values to existing arrays how can I make sure I only save a copy of the array with distinct values? (may end up being the exact same logic as used to solve my first question)
like image 275
Peter Avatar asked Feb 09 '23 15:02

Peter


1 Answers

The query to add a non-duplicate value can be done efficiently (in this example, I assume that the id and newValue parameters are provided):

OPTIONAL MATCH (n {id: {id}})
WHERE NONE(x IN n.A WHERE x = {newValue})
SET n.A = n.A + {newValue};

This query does not create a temporary array, and will only alter the n.A array if it does not already contain the {newValue} string.

[EDITED]

If you want to (a) create the n node if it does not already exist, and (b) append {newValue} to n.A only if {newValue} is not already in n.A, this should work:

OPTIONAL MATCH (n { id: {id} })
FOREACH (x IN (
  CASE WHEN n IS NULL THEN [1] ELSE [] END ) | 
  CREATE ({ id: {id}, A: [{newValue}]}))
WITH n, CASE WHEN EXISTS(n.A) THEN n.A ELSE [] END AS nA
WHERE NONE (x IN nA WHERE x = {newValue})
SET n.A = nA + {newValue};

If the OPTIONAL MATCH fails, then the FOREACH clause will create a new node node (with the {id} and an array containing {newValue}), and the following SET clause will do nothing because n would be NULL.

If the OPTIONAL MATCH succeeds, then the FOREACH clause will do nothing, and the following SET clause will append {newValue} to n.A iff that value does not already exist in n.A. If the SET should be performed, but the existing node did not already have the n.A property, then the query would concatenate an empty array to {newValue} (thus generating an array containing just {newValue}) and set that as the n.A value.

like image 137
cybersam Avatar answered Feb 16 '23 17:02

cybersam