Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expressions: find string without substring

Tags:

regex

I have a big text:

"Big piece of text. This sentence includes 'regexp' word. And this sentence doesn't include that word" 

I need to find substring that starts by 'this' and ends by 'word' but doesn't include word 'regexp'.

In this case the string: "this sentence doesn't include that word" is exactly what I want to receive.

How can I do this via Regular Expressions?

like image 413
Artem Avatar asked Aug 08 '12 17:08

Artem


People also ask

What is ?! In regex?

The ?! n quantifier matches any string that is not followed by a specific string n.

What is regex extract?

Extracts the first matching substrings according to a regular expression.


2 Answers

With an ignore case option, the following should work:

\bthis\b(?:(?!\bregexp\b).)*?\bword\b 

Example: http://www.rubular.com/r/g6tYcOy8IT

Explanation:

\bthis\b           # match the word 'this', \b is for word boundaries (?:                # start group, repeated zero or more times, as few as possible    (?!\bregexp\b)    # fail if 'regexp' can be matched (negative lookahead)    .                 # match any single character )*?                # end group \bword\b           # match 'word' 

The \b surrounding each word makes sure that you aren't matching on substrings, like matching the 'this' in 'thistle', or the 'word' in 'wordy'.

This works by checking at each character between your start word and your end word to make sure that the excluded word doesn't occur.

like image 97
Andrew Clark Avatar answered Sep 26 '22 14:09

Andrew Clark


Use lookahead asseterions.

When you want to check if a string does not contain another substring, you can write:

/^(?!.*substring)/ 

You must check also the beginning and the end of line for this and word:

/^this(?!.*substring).*word$/ 

Another problem here is that you don't want to find strings, you want to find sentences (if I understand your task right).

So the solution looks like this:

perl -e '   local $/;   $_=<>;   while($_ =~ /(.*?[.])/g) {      $s=$1;     print $s if $s =~ /^this(?!.*substring).*word[.]$/   };' 

Example of usage:

$ cat 1.pl local $/; $_=<>; while($_ =~ /(.*?[.])/g) {     $s=$1;     print $s if $s =~ /^\s*this(?!.*regexp).*word[.]/i; };  $ cat 1.txt This sentence has the "regexp" word. This sentence doesn't have the word. This sentence does have the "regexp" word again.  $ cat 1.txt | perl 1.pl   This sentence doesn't have the word. 
like image 23
Igor Chubin Avatar answered Sep 24 '22 14:09

Igor Chubin