Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does the regular expression ‘(?<=#)[^#]+(?=#)’ work?

Tags:

I have the following regex in a C# program, and have difficulties understanding it:

(?<=#)[^#]+(?=#) 

I'll break it down to what I think I understood:

(?<=#)    a group, matching a hash. what's `?<=`? [^#]+     one or more non-hashes (used to achieve non-greediness) (?=#)     another group, matching a hash. what's the `?=`? 

So the problem I have is the ?<= and ?< part. From reading MSDN, ?<name> is used for naming groups, but in this case the angle bracket is never closed.

I couldn't find ?= in the docs, and searching for it is really difficult, because search engines will mostly ignore those special chars.

like image 995
knittl Avatar asked Jun 22 '10 11:06

knittl


People also ask

How does a regular expression work?

A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation.

What does '$' mean in regex?

$ means "Match the end of the string" (the position after the last character in the string).

How do you give a regular expression?

Example : The regular expression ab+c will give abc, abbc, abbc, … and so on. The curly braces {…}: It tells the computer to repeat the preceding character (or set of characters) for as many times as the value inside this bracket.

What is regular expression with example?

Solution: As we know, any number of a's means a* any number of b's means b*, any number of c's means c*. Since as given in problem statement, b's appear after a's and c's appear after b's. So the regular expression could be: R = a* b* c*


1 Answers

They are called lookarounds; they allow you to assert if a pattern matches or not, without actually making the match. There are 4 basic lookarounds:

  • Positive lookarounds: see if we CAN match the pattern...
    • (?=pattern) - ... to the right of current position (look ahead)
    • (?<=pattern) - ... to the left of current position (look behind)
  • Negative lookarounds - see if we can NOT match the pattern
    • (?!pattern) - ... to the right
    • (?<!pattern) - ... to the left

As an easy reminder, for a lookaround:

  • = is positive, ! is negative
  • < is look behind, otherwise it's look ahead

References

  • regular-expressions.info/Lookarounds

But why use lookarounds?

One might argue that lookarounds in the pattern above aren't necessary, and #([^#]+)# will do the job just fine (extracting the string captured by \1 to get the non-#).

Not quite. The difference is that since a lookaround doesn't match the #, it can be "used" again by the next attempt to find a match. Simplistically speaking, lookarounds allow "matches" to overlap.

Consider the following input string:

and #one# and #two# and #three#four# 

Now, #([a-z]+)# will give the following matches (as seen on rubular.com):

and #one# and #two# and #three#four#     \___/     \___/     \_____/ 

Compare this with (?<=#)[a-z]+(?=#), which matches:

and #one# and #two# and #three#four#      \_/       \_/       \___/ \__/ 

Unfortunately this can't be demonstrated on rubular.com, since it doesn't support lookbehind. However, it does support lookahead, so we can do something similar with #([a-z]+)(?=#), which matches (as seen on rubular.com):

and #one# and #two# and #three#four#     \__/      \__/      \____/\___/ 

References

  • regular-expressions.info/Flavor Comparison
like image 183
polygenelubricants Avatar answered Sep 30 '22 00:09

polygenelubricants