I have the following regex in a C# program, and have difficulties understanding it: <pre class="prettyprint"><code>(?<=#)[^#]+(?=#) </code></pre> I'll break it down to what I think I understood: <pre class="prettyprint"><code>(?<=#) a group, matching a hash. what's `?<=`? [^#]+ one or more non-hashes (used to achieve non-greediness) (?=#) another group, matching a hash. what's the `?=`? </code></pre> So the problem I have is the <code>?<=</code> and <code>?<</code> part. From reading MSDN, <code>?<name></code> is used for naming groups, but in this case the angle bracket is never closed. I couldn't find <code>?=</code> in the docs, and searching for it is really difficult, because search engines will mostly ignore those special chars.

They are called lookarounds; they allow you to assert if a pattern matches or not, without actually making the match. There are 4 basic lookarounds: <ul> <li>Positive lookarounds: see if we CAN match the <code>pattern</code>... <ul> <li> <code>(?=pattern)</code> - ... to the right of current position (look ahead)</li> <li> <code>(?<=pattern)</code> - ... to the left of current position (look behind)</li> </ul> </li> <li>Negative lookarounds - see if we can NOT match the <code>pattern</code> <ul> <li> <code>(?!pattern)</code> - ... to the right </li> <li> <code>(?<!pattern)</code> - ... to the left </li> </ul> </li> </ul> As an easy reminder, for a lookaround: <ul> <li> <code>=</code> is positive, <code>!</code> is negative </li> <li> <code><</code> is look behind, otherwise it's look ahead </li> </ul> <h3>References</h3> <ul> <li>regular-expressions.info/Lookarounds</li> </ul> <hr> <h3>But why use lookarounds?</h3> One might argue that lookarounds in the pattern above aren't necessary, and <code>#([^#]+)#</code> will do the job just fine (extracting the string captured by <code>\1</code> to get the non-<code>#</code>). Not quite. The difference is that since a lookaround doesn't match the <code>#</code>, it can be "used" again by the next attempt to find a match. Simplistically speaking, lookarounds allow "matches" to overlap. Consider the following input string: <pre class="prettyprint"><code>and #one# and #two# and #three#four# </code></pre> Now, <code>#([a-z]+)#</code> will give the following matches (as seen on rubular.com): <pre class="prettyprint"><code>and #one# and #two# and #three#four# \___/ \___/ \_____/ </code></pre> Compare this with <code>(?<=#)[a-z]+(?=#)</code>, which matches: <pre class="prettyprint"><code>and #one# and #two# and #three#four# \_/ \_/ \___/ \__/ </code></pre> Unfortunately this can't be demonstrated on rubular.com, since it doesn't support lookbehind. However, it does support lookahead, so we can do something similar with <code>#([a-z]+)(?=#)</code>, which matches (as seen on rubular.com): <pre class="prettyprint"><code>and #one# and #two# and #three#four# \__/ \__/ \____/\___/ </code></pre> <h3>References</h3> <ul> <li>regular-expressions.info/Flavor Comparison</li> </ul>

How does the regular expression ‘(?<=#)[^#]+(?=#)’ work?

Tags:

I have the following regex in a C# program, and have difficulties understanding it:

(?<=#)[^#]+(?=#)

I'll break it down to what I think I understood:

(?<=#)    a group, matching a hash. what's `?<=`? [^#]+     one or more non-hashes (used to achieve non-greediness) (?=#)     another group, matching a hash. what's the `?=`?

So the problem I have is the ?<= and ?< part. From reading MSDN, ?<name> is used for naming groups, but in this case the angle bracket is never closed.

I couldn't find ?= in the docs, and searching for it is really difficult, because search engines will mostly ignore those special chars.

995

asked Jun 22 '10 11:06

knittl

1 Answers

They are called lookarounds; they allow you to assert if a pattern matches or not, without actually making the match. There are 4 basic lookarounds:

Positive lookarounds: see if we CAN match the pattern...
- (?=pattern) - ... to the right of current position (look ahead)
- (?<=pattern) - ... to the left of current position (look behind)
Negative lookarounds - see if we can NOT match the pattern
- (?!pattern) - ... to the right
- (?<!pattern) - ... to the left

As an easy reminder, for a lookaround:

= is positive, ! is negative
< is look behind, otherwise it's look ahead

References

regular-expressions.info/Lookarounds

But why use lookarounds?

One might argue that lookarounds in the pattern above aren't necessary, and #([^#]+)# will do the job just fine (extracting the string captured by \1 to get the non-#).

Not quite. The difference is that since a lookaround doesn't match the #, it can be "used" again by the next attempt to find a match. Simplistically speaking, lookarounds allow "matches" to overlap.

Consider the following input string:

and #one# and #two# and #three#four#

Now, #([a-z]+)# will give the following matches (as seen on rubular.com):

and #one# and #two# and #three#four#     \___/     \___/     \_____/

Compare this with (?<=#)[a-z]+(?=#), which matches:

and #one# and #two# and #three#four#      \_/       \_/       \___/ \__/

Unfortunately this can't be demonstrated on rubular.com, since it doesn't support lookbehind. However, it does support lookahead, so we can do something similar with #([a-z]+)(?=#), which matches (as seen on rubular.com):

and #one# and #two# and #three#four#     \__/      \__/      \____/\___/

References

regular-expressions.info/Flavor Comparison

183

answered Sep 30 '22 00:09

polygenelubricants

Related questions
                            
                                What's the best WYSIWYG editor currently available? (jQuery suppport/integration is a plus.) [closed]
                            
                                HowTo restore QTreeView last expanded state?
                            
                                Visual studio 2010 Switch statement generation by enum
                            
                                eval in if statement?
                            
                                A distinct HTTP status for not logged in vs. not authorized in a RESTful API
                            
                                How to draw a line on an image in matlab?
                            
                                WCF 4 Rest Getting IP of Request?
                            
                                How can I get latitude, longitude of a location programmatically or using a api
                            
                                Scala: curried constructors
                            
                                Can I use for-comprehenion / yield to create a map in Scala?
                            
                                How can I set Datasource when I'm creating Hibernate SessionFactory?
                            
                                How to correctly filter Package replaced broadcast

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does the regular expression ‘(?<=#)[^#]+(?=#)’ work?

Tags:

knittl

People also ask

1 Answers

References

But why use lookarounds?

References

polygenelubricants

Recent Activity

Donate For Us