Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Negative lookahead followed by spaces

I can't figure out reverse negative lookup. Suppose I have a text

qwe  abc
qwe abc
abc

and I want to find all abc which is not going after qwe, which might be followed by any amount of spaces.

(?<!qwe)\s*?(abc)

Matches everything. I assumed it would be something like "match arbitrary amount of spaces followed by abc if there's no qwe in front of it"

I tried also

qwe|(abs) 

approach, but it does not work for me. Although groups are empty for the cases where I do not want match to work, I don't really get how do I use it with re.sub function (which need to). Even though groups are empty, re.sub does replace the string.

Env: python 3

like image 839
Roman Avatar asked Apr 09 '21 13:04

Roman


3 Answers

You don't need to use a lookbehind here. Just stick with a negative lookahead that allows dynamic length assertions:

^(?!.*qwe\s+abc).*abc

Or with word boundaries to make sure qwe and abc are complete words.

^(?!.*\bqwe\s+abc\b).*\babc\b

RegEx Demo

RegEx Explanation:

  • ^: Start
  • (?!.*qwe\s+abc): Negative lookahead to fail the match if we have qwe followed by 1+ whitespaces followed by abc is found anywhere in the line
  • .*: Match 0 or more of any characters
  • abc: Match abc
like image 50
anubhava Avatar answered Oct 21 '22 11:10

anubhava


You can find an interesting article on "The Best Regex Trick" here where you would first have to match what you don't want using alternations. Then capture what you do want inside a capture group.

The syntax would be: MatchWhatYouDon'tWant|(MatchWhatYouDoWant). In your particular case we can use some extra syntax using word-boundaries and a non-capturing group to nest the alternation in:

\b(?:qwe\b\s+abc|(abc))\b

See the online demo

  • \b - Word-boundary.
  • (?: - Open non-capturing group:
    • qwe\b\s+abc - Match "qwe" literally followed by a word-boundary, 1+ whitespace characters and "abc".
    • | - Or:
    • (abc) - Match "abc" within the 1st capturing group.
    • ) - Close non-capturing group.
  • \b - Word-boundary.
like image 20
JvdV Avatar answered Oct 21 '22 10:10

JvdV


The reason you match abc in group 1 for all 3 examples, is that your pattern (?<!qwe)\s*?(abc) asserts at the current position that what is directly to the left is not qwe and then matches optional whitespace chars.

This assertion is true for the first 2 examples at the position after the space that follows qwe. The pattern can move to that position where the assertion is true, because it can match a whitespace char making the assertion true at that position.

The third example get a match as there is no qwe present at the left.

Note that for example there will be no match for qweabc as there is no room for a whitespace char to be matched making the assertion true.


re does not support variable length lookbehinds, but the PyPi regex module does.

(?<!qwe\s*)abc
  • (?<!qwe\s*) Positive lookbehind to assert that directly to the left is not qwe followed by optional whitespace chars.
  • abc Match literally (You don't need the group anymore)

Regex demo | Python demo

like image 1
The fourth bird Avatar answered Oct 21 '22 12:10

The fourth bird