Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

negative lookahead regex on elasticsearch

I'm trying to do a negative lookahead on an elasticsearch query, the regex is:

(?!.*charge)(?!.*encode)(?!.*relate).*night.*

the text that I'm matching against is:

credited back on night stay, still having issues with construction. causing health issues due to a chemical being sprayed and causes eyes to irritated.

I didn't get any lucky. Can someone give a hand?

ES query:

  "query": {
    "filtered": {
      "query": {
        "bool": {
          "must_not": [
            {
              "regexp": {
                "message": {
                  "value": "(?!.*charge)(?!.*encode)(?!.*relate).*night.*",
                  "flags_value": 65535
                }
              }
            }
          ]
        }
      },
      "filter": {
        "match": {
          "resNb": {
            "query": "462031152161",
            "type": "boolean"
          }
        }
      }
    }
  }
like image 435
Reginaldo Soares Avatar asked Jul 28 '16 20:07

Reginaldo Soares


People also ask

What is a negative lookahead regex?

The negative lookahead construct is the pair of parentheses, with the opening parenthesis followed by a question mark and an exclamation point. Inside the lookahead, we have the trivial regex u. Positive lookahead works just the same. q(?= u) matches a q that is followed by a u, without making the u part of the match.

Does grep support negative lookahead?

Negative lookahead, which is what you're after, requires a more powerful tool than the standard grep . You need a PCRE-enabled grep. If you have GNU grep , the current version supports options -P or --perl-regexp and you can then use the regex you wanted.

How do you use negative Lookbehind regex?

Negative Lookbehind Syntax:Where match is the item to match and element is the character, characters or group in regex which must not precede the match, to declare it a successful match. So if you want to avoid matching a token if a certain token precedes it you may use negative lookbehind. For example / (? <!

Can I use negative lookahead?

Negative lookahead That's a number \d+ , NOT followed by € . For that, a negative lookahead can be applied. The syntax is: X(?! Y) , it means "search X , but only if not followed by Y ".


1 Answers

Solution

You can solve the issue with either of the two:

"value": "~(charge|encode|relate)night~(charge|encode|relate)",

or

.*night.*&~(.*(charge|encode|relate).*)

With an optional (since it is ON by default)

"flags" : "ALL"

How does it work?

In common NFA regular expressions, you usually have negative lookarounds that help restrict a more generic pattern (those that look like (?!...) or (?<!...)). However, in ElasticSearch, you need to use specific optional operators.

The ~ (tilde) is the complement that is *used to negate an atom right after it. An atom is either a single symbol or a group of subpatterns/alternatives inside a group.

NOTE that all ES patterns are anchored at the start and end of string by default, you never need to use ^ and $ common in Perl-like and .NET, and other NFAs.

Thus,

  • ~(charge|encode|relate) - matches any text from the start of the string other than charge, encode and relate
  • night - matches the word night
  • ~(charge|encode|relate) - matches any text other than either of the 3 substrings up to the end of string.

In an NFA regex like Perl, you could write that pattern using a tempered greedy token:

/^(?:(?!charge|encode|relate).)*night(?:(?!charge|encode|relate).)*$/

The second pattern is trickier: common NFA regexes usually do not jump from location to location when matching, thus, lookaheads anchored at the start of text are commonly used. Here, using an INTERSECTION we can just use 2 patterns, where one will be matching the string and the second one should also match the string.

  • .*night.* - match the whole line (as . matches any symbol but a newline, else, use (.|\n)*) with night in it
  • & - and
  • ~(.*(charge|encode|relate).*) - the line that does not have charge, encode and relate substrings in it.

An NFA Perl-like regex would look like

/^(?!.*(charge|encode|relate)).*night.*$/
like image 172
Wiktor Stribiżew Avatar answered Sep 22 '22 13:09

Wiktor Stribiżew