Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does this regex work when finding the last occurrence of a word?

Tags:

regex

I came across a regex like the following:

foo(?!.*foo)

if it is fed with foo bar bar foo, it will find the last occurrence of foo. I know it uses a mechanism called negative lookahead which means it will match a word which not end with characters after the ?!. But how does the regex here works?

like image 512
photosynthesis Avatar asked May 20 '14 04:05

photosynthesis


People also ask

How does a regex pattern work?

A regex pattern matches a target string. The pattern is composed of a sequence of atoms. An atom is a single point within the regex pattern which it tries to match to the target string. The simplest atom is a literal, but grouping parts of the pattern to match an atom will require using ( ) as metacharacters.

What does '$' mean in regex?

$ means "Match the end of the string" (the position after the last character in the string).

How do you find how many occurrences of a regex pattern were replaced in a string?

To count a regex pattern multiple times in a given string, use the method len(re. findall(pattern, string)) that returns the number of matching substrings or len([*re. finditer(pattern, text)]) that unpacks all matching substrings into a list and returns the length of it as well.

How do you match everything after a word in regex?

Method 1: Match everything after first occurence Whitespace characters include spaces, tabs, linebreaks, etc. while non-whitespace characters include all letters, numbers, and punctuation. So essentially, the \s\S combination matches everything.


3 Answers

Slightly different answer from sshashank (because the word containing in his answer doesn't work for me and in regex you have to be pedantic—it's all about precision.) I'm 100% sure sshashank knows this and only phrased it that way for brevity.

The regex matches foo, not followed (i.e., negative lookahead (?!) by this:

{{{any number of any characters (i.e., .*) then the characters foo}}}

If the lookahead fails, the portion corresponding to .* does not contain foo. foo comes later.

See this automatic translation:

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  foo                      'foo'
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
    foo                      'foo'
--------------------------------------------------------------------------------
  )                        end of look-ahead

The same in different words from regex101:

/foo(?!.*foo)/

foo matches the characters foo literally (case sensitive)
(?!.*foo) Negative Lookahead - Assert that it is impossible to match the regex below
    .* matches any character (except newline)
        Quantifier: Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
    foo matches the characters foo literally (case sensitive)

What does RegexBuddy have to say?

foo(?!.*foo)

foo(?!.*foo)
  • Match the character string “foo” literally (case sensitive) foo
  • Assert that it is impossible to match the regex below starting at this position (negative lookahead) (?!.*foo)
    • Match any single character that is NOT a line break character (line feed, carriage return, next line, line separator, paragraph separator) .*
      • Between zero and unlimited times, as many times as possible, giving back as needed (greedy) *
    • Match the character string “foo” literally (case sensitive) foo
like image 131
zx81 Avatar answered Apr 02 '23 20:04

zx81


It matches foo only if it is not followed (?!) by any more text (.*) containing foo in it.

like image 37
sshashank124 Avatar answered Apr 02 '23 20:04

sshashank124


Negative lookahead is essential if you want to match something not followed by something else.

Short explanation:

foo(?!.*foo) matches foo when not followed by any character except \n and `foo`

For example, say you have the following two strings.

foobar
barfoo

And the regular expression:

foo(?!bar)

This matches foo when not followed by bar so it would match the string barfoo here.

like image 37
hwnd Avatar answered Apr 02 '23 19:04

hwnd