Aim:
I would like to match sentences with the word 'no' but only if 'no' is not preceeded by 'with' or 'there is' or 'there are' in r.
Input:
The ground was rocky with no cracks in it
No diggedy, no doubt
Understandably, there is no way an elephant can be green
Expected output:
The ground was rocky with no cracks in it
Understandably, there is no way an elephant can be green
Attempt:
gsub(".*(?:((?<!with )|(?<!there is )|(?<!there are ))\\bno\\b(?![?:A-Za-z])|([?:]\\s*N?![A-Za-z])).*\\R*", "", input_string, perl=TRUE, ignore.case=TRUE)
Problem:
The negative lookbehind seems to be ignored so that all the sentences are replaced. Is the problem the use of alternation in a the lookbehind statement?
Lookbehind has the same effect, but works backwards. It tells the regex engine to temporarily step backwards in the string, to check if the text inside the lookbehind can be matched there. (? <!a)b matches a “b” that is not preceded by an “a”, using negative lookbehind.
JavaScript doesn't support any lookbehind, but it can support lookaheads.
I created a test using grep but it does not work in sed . This works correctly by returning bar . I was expecting footest as output, but it did not work. sed does not support lookaround assertions.
Note: Lookbehind in JS regular expressions is Not Supported on Safari 7.1, which means that any user who'd be accessing your page through Safari 7.1 can see it perfectly.
Specific syntaxes are used to meet that goal. They are known as lookahead and lookbehind. Together they are called lookaround. As a rule, lookaround corresponds to characters, giving up the match and returning only the result: no match or match.
Actually lookaround is divided into lookbehind and lookahead assertions. Lookbehind means to check what is before your regex match while lookahead means checking what is after your match. And the presence or absence of an element before or after match item plays a role in declaring a match.
Lookbehind assertions are sometimes thought to be a bit difficult to comprehend and construct however, if some basic rules are followed they are as simple as any other regular expression element or group. Actually lookaround is divided into lookbehind and lookahead assertions.
Lookbehind allows adding a condition for what is behind. In other words, it allows matching a pattern only if there is something before it. Lookbehind can also be positive and negative.
You may use
(?mxi)^ # Start of a line (and free-spacing/case insensitive modes are on)
(?: # Outer container group start
(?!.*\b(?:with|there\h(?:is|are))\h+no\b) # no 'with/there is/are no' before 'no'
.*\bno\b # 'no' whole word after 0+ chars
(?![?:]) # cannot be followed with ? or :
| # or
.* # any 0+ chars
[?:]\h*n(?![a-z]) # ? or : followed with 0+ spaces, 'n' not followed with any letter
) # container group end
.* # the rest of the line and
\R* # 0+ line breaks
See the regex demo. In short: the pattern finds 2 alternatives, either of the 2 types of lines, one with no
whole word in it that is not preceded with with
, there is
or there are
and a space after them, or a line that contains ?
or :
followed with 0+ horizontal spaces (\h
) and then an n
not followed with any other letter.
See the R demo:
sentences <- "The ground was rocky with no cracks in it\r\nNo diggedy, no doubt\r\nUnderstandably, there is no way an elephant can be green"
rx <- "(?mxi)^ # Start of a line
(?: # Outer container group start
(?!.*\\b(?:with|there\\h(?:is|are))\\h+no\\b) # no 'with/there is/are no' before 'no'
.*\\bno\\b # 'no' whole word after 0+ chars
(?![?:]) # cannot be followed with ? or :
| # or
.* # any 0+ chars
[?:]\\h*n(?![a-z]) # ? or : followed with 0+ spaces, 'n' not followed with any letter
) # container group end
.* # the rest of the line and 0+ line breaks
\\R*"
res <- gsub(rx, "", sentences, perl=TRUE)
cat(res, sep="\n")
Output:
The ground was rocky with no cracks in it
Understandably, there is no way an elephant can be green
Thanks to the x
modifier, you may add comments to the regex pattern and use spaces to format it for better readability. Note that all literal whitespace must be replaced with \\h
(horizontal whitespace), \\s
(any whitespace), \\n
(LF), \\r
(CR), etc. to make it work in such a pattern.
The (?i)
modifier stands for the ingore.case=TRUE
.
You just need to have a regex alternation character. Idea is to match and capture all the possible "no" sentences and match all the remaining sentences. Then replace all the matched characters with \\1
ie, characters from the first capturing group.
gsub("(?i)(.*(with|there (?:is|are)) no\\b.*)|.*", "\\1" ,string, perl=T)
DEMO
Example:
x <- "The ground was rocky with no cracks in it\nNo diggedy, no doubt\nUnderstandably, there is no way an elephant can be green"
gsub("(?i)(.*(with|there (?:is|are)) no\\b.*\\n?)|.*\\n?", "\\1" ,x, perl=T)
# [1] "The ground was rocky with no cracks in it\nUnderstandably, there is no way an elephant can be green"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With