Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

REGEXP: capture group NOT followed by

I need to match following statements:

Hi there John
Hi there John Doe (jdo)

Without matching these:

Hi there John Doe is here 
Hi there John is here

So I figured that this regexp would work:

^Hi there (.*)(?! is here)$

But it does not - and I am not sure why - I believe this may be caused by the capturing group (.*) so i thought that maybe making * operator lazy would solve the problem... but no. This regexp doesn't work too:

^Hi there (.*?)(?! is here)$

Can anyone point me in the solutions direction?

Solution

To retrieve sentence without is here at the end (like Hi there John Doe (the second)) you should use (author @Thorbear):

^Hi there (.*$)(?<! is here)

And for sentence that contains some data in the middle (like Hi there John Doe (the second) is here, John Doe (the second) being the desired data)simple grouping would suffice:

^Hi there (.*?) is here$

.

           ╔══════════════════════════════════════════╗
           ║▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒║
           ║▒▒▒Everyone, thank you for your replies▒▒▒║
           ║▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒║
           ╚══════════════════════════════════════════╝
like image 679
MatBos Avatar asked Aug 01 '12 14:08

MatBos


People also ask

How do you write a non-capturing group in regex?

Sometimes you want to use parentheses to group parts of an expression together, but you don't want the group to capture anything from the substring it matches. To do this use (?: and ) to enclose the group.

What is the point of non-capturing group in regex?

The non-capturing group (?...) does not remove any characters from the original full match, it only reorganises the regex visually to the programmer.

How does group work in regex?

What is Group in Regex? A group is a part of a regex pattern enclosed in parentheses () metacharacter. We create a group by placing the regex pattern inside the set of parentheses ( and ) . For example, the regular expression (cat) creates a single group containing the letters 'c', 'a', and 't'.


1 Answers

the .* will find a match regardless of being greedy, because at the end of the line, there is no following is here (naturally).

A solution to this could be to use lookbehind instead (checking from the end of the line, if the past couple of characters matches with is here).

^Hi there (.*)(?<! is here)$

Edit

As suggested by Alan Moore, further changing the pattern to ^Hi there (.*$)(?<! is here) will increase the performance of the pattern because the capturing group will then gobble up the rest of the string before attempting the lookbehind, thus saving you of unnecessary backtracking.

like image 64
Thorbear Avatar answered Oct 24 '22 13:10

Thorbear