Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use alternation with lookbehind

Tags:

regex

r

Aim:

I would like to match sentences with the word 'no' but only if 'no' is not preceeded by 'with' or 'there is' or 'there are' in r.

Input:

The ground was rocky with no cracks in it
No diggedy, no doubt
Understandably, there is no way an elephant can be green

Expected output:

The ground was rocky with no cracks in it
Understandably, there is no way an elephant can be green

Attempt:

gsub(".*(?:((?<!with )|(?<!there is )|(?<!there are ))\\bno\\b(?![?:A-Za-z])|([?:]\\s*N?![A-Za-z])).*\\R*", "", input_string, perl=TRUE, ignore.case=TRUE)

Problem:

The negative lookbehind seems to be ignored so that all the sentences are replaced. Is the problem the use of alternation in a the lookbehind statement?

like image 335
Sebastian Zeki Avatar asked Oct 30 '17 08:10

Sebastian Zeki


People also ask

How does regex Lookbehind work?

Lookbehind has the same effect, but works backwards. It tells the regex engine to temporarily step backwards in the string, to check if the text inside the lookbehind can be matched there. (? <!a)b matches a “b” that is not preceded by an “a”, using negative lookbehind.

Does JavaScript regex support Lookbehind?

JavaScript doesn't support any lookbehind, but it can support lookaheads.

Does SED support Lookbehind?

I created a test using grep but it does not work in sed . This works correctly by returning bar . I was expecting footest as output, but it did not work. sed does not support lookaround assertions.

Does Safari support Lookbehind?

Note: Lookbehind in JS regular expressions is Not Supported on Safari 7.1, which means that any user who'd be accessing your page through Safari 7.1 can see it perfectly.

What is the difference between lookahead and lookbehind?

Specific syntaxes are used to meet that goal. They are known as lookahead and lookbehind. Together they are called lookaround. As a rule, lookaround corresponds to characters, giving up the match and returning only the result: no match or match.

What is the difference between lookbehind and lookahead assertions in regex?

Actually lookaround is divided into lookbehind and lookahead assertions. Lookbehind means to check what is before your regex match while lookahead means checking what is after your match. And the presence or absence of an element before or after match item plays a role in declaring a match.

What is a lookbehind assertion?

Lookbehind assertions are sometimes thought to be a bit difficult to comprehend and construct however, if some basic rules are followed they are as simple as any other regular expression element or group. Actually lookaround is divided into lookbehind and lookahead assertions.

What is the use of lookbehind in Python?

Lookbehind allows adding a condition for what is behind. In other words, it allows matching a pattern only if there is something before it. Lookbehind can also be positive and negative.


2 Answers

You may use

(?mxi)^       # Start of a line (and free-spacing/case insensitive modes are on)
(?:           # Outer container group start
  (?!.*\b(?:with|there\h(?:is|are))\h+no\b) # no 'with/there is/are no' before 'no'
  .*\bno\b  # 'no' whole word after 0+ chars
  (?![?:])    # cannot be followed with ? or :
|             # or
  .*          # any 0+ chars
  [?:]\h*n(?![a-z]) # ? or : followed with 0+ spaces, 'n' not followed with any letter
)             # container group end
.*            # the rest of the line and 
\R*           # 0+ line breaks

See the regex demo. In short: the pattern finds 2 alternatives, either of the 2 types of lines, one with no whole word in it that is not preceded with with, there is or there are and a space after them, or a line that contains ? or : followed with 0+ horizontal spaces (\h) and then an n not followed with any other letter.

See the R demo:

sentences <- "The ground was rocky with no cracks in it\r\nNo diggedy, no doubt\r\nUnderstandably, there is no way an elephant can be green"
rx <- "(?mxi)^ # Start of a line
(?:            # Outer container group start
  (?!.*\\b(?:with|there\\h(?:is|are))\\h+no\\b) # no 'with/there is/are no' before 'no'
  .*\\bno\\b   # 'no' whole word after 0+ chars
  (?![?:])     # cannot be followed with ? or :
|              # or
  .*           # any 0+ chars
  [?:]\\h*n(?![a-z]) # ? or : followed with 0+ spaces, 'n' not followed with any letter
)              # container group end
.*             # the rest of the line and 0+ line breaks
\\R*"
res <- gsub(rx, "", sentences, perl=TRUE)
cat(res, sep="\n")

Output:

The ground was rocky with no cracks in it
Understandably, there is no way an elephant can be green

Thanks to the x modifier, you may add comments to the regex pattern and use spaces to format it for better readability. Note that all literal whitespace must be replaced with \\h (horizontal whitespace), \\s (any whitespace), \\n (LF), \\r (CR), etc. to make it work in such a pattern.

The (?i) modifier stands for the ingore.case=TRUE.

like image 73
Wiktor Stribiżew Avatar answered Oct 11 '22 00:10

Wiktor Stribiżew


You just need to have a regex alternation character. Idea is to match and capture all the possible "no" sentences and match all the remaining sentences. Then replace all the matched characters with \\1 ie, characters from the first capturing group.

gsub("(?i)(.*(with|there (?:is|are)) no\\b.*)|.*", "\\1" ,string, perl=T)

DEMO

Example:

x <- "The ground was rocky with no cracks in it\nNo diggedy, no doubt\nUnderstandably, there is no way an elephant can be green"
gsub("(?i)(.*(with|there (?:is|are)) no\\b.*\\n?)|.*\\n?", "\\1" ,x, perl=T)
# [1] "The ground was rocky with no cracks in it\nUnderstandably, there is no way an elephant can be green"
like image 24
Avinash Raj Avatar answered Oct 11 '22 01:10

Avinash Raj