I have a domain class which has a property by name "alias" which is an arraylist of strings like below:
private List<String> alias;
alias contains the following values: {"John Doe","Alex Smith","Greg Walsh"}
I'd like to be able to make a query like: "I saw Smith today" using my repository query shown below and get the array value Output "Alex Smith":
@Query("MATCH (p:Person) WHERE {0} IN p.alias RETURN p")
Iterable<Person> findByAlias(String query);
I tried a bunch of different queries like the one shown above but this would only match if the input query matches the array value exactly.
I want do a input query sub-string match with the array values.
Eg: Input Query: "I saw Smith today" Output: "Alex Smith"
It is sort of possible to do what you want, but the query will be horribly ugly and slow. You'd be better off using nodes and relationships instead of collection properties: it will make your queries more sensible and allows you to use a full-text index. You should also figure out what part of the 'input string' you are looking for before you send your query to the database. As it stands, you are confusing the regex pattern with the data it is supposed to match and even if it were possible to express your intention as a regex it would be much better to do that processing your application, before sending the query.
WHERE ... IN ...
doesn't do regexWHERE x IN y
will not treat x
as a regular expression, it will take x
's value for what it is and look for an exact match. WHERE ... IN ...
is analogous to WHERE ... = ...
in this sense, and you would need the analogue of =~
for collections, something like IN~
, for this. There is no such construct in Cypher.
You can use a string as a regular expression to test for matches over a collection by using a predicate like ANY
or FILTER
.
CREATE (p:Person {collectionProperty:["Paulo","Jean-Paul"]})
and
WITH "(?i).*Paul" as param
MATCH (p:Person)
WHERE ANY(item IN p.collectionProperty WHERE item =~ param)
RETURN p
will return the node because it makes a successful regular expression match on "Jean-Paul".
This, however, will have terrible performance since you will run your regular expression on every item
in every collectionProperty
for every :Person
in your database. The solution is to use a full-text index, but his query can't use indices for two reasons:
The biggest problem with your query is that you are trying to turn "I saw Smith today"
into "Smith"
by adding some regular expression sugar. How do you intend to do that? If you use the string as a regular expression, each of those characters is a literal character expected to be in the data that you match it on. You are confused about .*
, which when used in 'Smith.*'
would match 'Smith'
plus zero or more additional characters in the data. But you try to use it to say that zero or more characters may follow something in the pattern.
Take the query in your comment:
MATCH (p:Person)
WHERE '(?i).*I saw Smith today.*' IN p.alias
RETURN p
The regular expression '(?i).*I saw Smith today.*'
will match
'i SAW smith TOday'
, etc.'Yes, I saw Smith today and he looked happy.'
But adding .*
won't somehow magically make the pattern mean '.*Smith.*'
. What's more, it's almost impossible to express 'I saw Smith today'
as a subset of 'Alex Smith'
by any amount of added regular expression sugar. Instead, you should process that string and figure out what parts of it you want to use in a regular expression before you send your query. How is the database to know that 'Smith'
is the part of the input string that you want to use? However it is that you know it, you should know it before you send the query, and only include that relevant part.
You could insert a
?
after each character in the pattern to make each character optionalRETURN "Smith" =~ "I? ?s?a?w? ?S?m?i?t?h? ?t?o?d?a?y?"
But now your pattern is way too loosie goosie, and it will match strings like
'I sat today'
and'sam toy
'. Moreover, it won't match'Alex Smith
' unless you prepend.*
, but then it is even more loosie goosie and will match any string whatever.
You could divide characters that belong together into groups and make the groups and the spaces between them optional.
RETURN "Smith" =~ "(I)? ?(saw)? ?(Smith)? ?(today)?"
But this also is too broad, fails to match
'Alex Smith'
and will match any string whatever if you prepend.*
.
The only 'solution' I can think of is a hideous query that splits your string on whitespace, concatenates some regex sugar into each word and compares it as a regular expression in a predicate clause. It really is hideous, and it assumes that you already know that you want match the words in the string and not the whole string, in which case you should be doing that processing before you send your query and not in Cypher. Look upon this hideousness and weep
WITH "I saw Paul today" AS paramString
MATCH (p:Person)
WHERE ANY (param IN split(paramString, ' ')
WHERE ANY (item IN p.collectionProperty
WHERE item =~('(?i).*' + param)))
RETURN p
The conclusion is as follows:
Keep a node for each alias like so
CREATE (a:Alias)
SET a.aliasId = "Alex Smith"
Create a full-text index for these nodes. See blog and docs for the generic case and docs for SDN.
Connect the nodes that now have the alias in a collection property to the new alias node with a relationship.
Look up the alias node that you want and follow the relationship to the node that 'has' the alias. A node can still have many aliases, but you no longer store them in a collection property–your query logic will be more straightforward and you will benefit from the full-text lucene index. Query with START n=node:indexName("query")
when using cypher and use findAllByQuery()
in SDN. This is necessary for the query to use the full-text index.
Your query might then finally look something like
START n=node:myIndex("aliasId:*smith")
MATCH n<-[:HAS_ALIAS]-smith
RETURN smith
If your program is supposed to receive a string like 'I saw Smith today'
and give back a node based on a pattern match on 'Smith'
, then don't send 'I saw'
and 'today'
to the database. You're better off identifying 'Smith'
as the relevant part of the string in your application–when you send the query you should already know what it is you want.
You should use something like this:
MATCH (n:Test)
WHERE single(x IN n.prop WHERE x = "elem1")
RETURN n
It checks that collection has exactly one "elem1".
More info.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With