PHP regex crashing apache

Question

I've got a regex that does matching for a template system, which unfortunately seems to crash apache (it's running on Windows) on some modestly-trivial lookups. I've researched the issue and there are a few suggestions for upping stack size etc, none of which seem to work and I don't really like dealing with such issues by upping limits anyway as it generally just pushed the bug into the future.

Anyway any ideas on how to alter the regex to make it less likely to foul up?

The idea is to catch the innermost block (in this case {block:test}This should be caught first!{/block:test}) which I'll then str_replace out the starting/ending tags and re-run the whole thing through the regex until there are no blocks left.

Regex:

~(?P<opening>{(?P<inverse>[!])?block:(?P<name>[a-z0-9\s_-]+)})(?P<contents>(?:(?!{/?block:[0-9a-z-_]+}).)*)(?P<closing>{/block:\3})~ism

Sample template:

<div class="f_sponsors s_banners">
    <div class="s_previous">&laquo;</div>
    <div class="s_sponsors">
        <ul>
            {block:sponsors}
            <li>
                <a href="{var:url}" target="_blank">
                    <img src="image/160x126/{var:image}" alt="{var:name}" title="{var:name}" />
                </a>
            {block:test}This should be caught first!{/block:test}
            </li>
            {/block:sponsors}
        </ul>
    </div>
    <div class="s_next">&raquo;</div>
</div>

It's a long shot I suppose. :(

Alan Moore · Accepted Answer

Try this one:

'~(?P<opening>\{(?P<inverse>[!])?block:(?P<name>[a-z0-9\s_-]+)\})(?P<contents>[^{]*(?:\{(?!/block:(?P=name)\})[^{]*)*)(?P<closing>\{/block:(?P=name)\})~i'

Or, in readable form:

'~(?P<opening>
  \{
  (?P<inverse>[!])?
  block:
  (?P<name>[a-z0-9\s_-]+)
  \}
)
(?P<contents>
  [^{]*(?:\{(?!/block:(?P=name)\})[^{]*)*
)
(?P<closing>
  \{
  /block:(?P=name)
  \}
)~ix'

The most important part is in the (?P<contents>..) group:

[^{]*(?:\{(?!/block:(?P=name)\})[^{]*)*

Starting out, the only character we're interested in is the opening brace, so we can slurp up any other characters with [^{]*. Only after we see a { do we check to see if it's the beginning of a {/block} tag. If it isn't, we go ahead and consume it and start scanning for the next one, and repeat as necessary.

Using RegexBuddy, I tested each regex by placing the cursor at the beginning of the {block:sponsors} tag and debugging. Then I removed the ending brace from the closing {/block:sponsors} tag to force a failed match and debugged it again. Your regex took 940 step to succeed and 2265 steps to fail. Mine took 57 steps to succeed and 83 steps to fail.

On a side note, I removed the s modifier because because I'm not using the dot (.), and the m modifier because it never was needed. I also used the named backreference (?P=name) instead of \3 as per @DaveRandom's excellent suggestion. And I escaped all the braces ({ and }) because I find it easier to read that way.

EDIT: If you want to match the innermost named block, change the middle portion of the regex from this:

(?P<contents>
  [^{]*(?:\{(?!/block:(?P=name)\})[^{]*)*
)

...to this (as suggested by @Kobi in his comment):

(?P<contents>
  [^{]*(?:\{(?!/?block:[a-z0-9\s_-]+\})[^{]*)*
)

Originally, the (?P<opening>...) group would grab the first opening tag it saw, then the (?P<contents>..) group would consume anything--including other tags--as long as they weren't the closing tag to match the one found by the (?P<opening>...) group. (Then the (?P<closing>...) group would go ahead and consume that.)

Now, the (?P<contents>...) group refuses to match any tag, opening or closing (note the /? at the beginning), no matter what the name is. So the regex initially starts to match the {block:sponsors} tag, but when it encounters the {block:test} tag, it abandons that match and goes back to searching for an opening tag. It starts again at the {block:test} tag, this time successfully completing the match when it finds the {/block:test} closing tag.

It sounds inefficient describing it like this, but it's really not. The trick I described earlier, slurping up the non-braces, drowns out the effect of these false starts. Where you were doing a negative lookahead at almost every position, now you're doing one only when you encounter a {. You could even use possessive quantifiers, as as @godspeedlee suggested:

(?P<contents>
  [^{]*+(?:\{(?!/?block:[a-z0-9\s_-]+\})[^{]*+)*+
)

...because you know it will never consume anything that it will have to give back later. That would speed things up a little, but it isn't really necessary.

PHP regex crashing apache

Tags:

regex

php

windows

apache

Meep3D

1 Answers

Alan Moore

Recent Activity

Donate For Us

PHP regex crashing apache

Tags:

regex

php

windows

apache

Meep3D

1 Answers

Alan Moore

Related questions

Recent Activity

Donate For Us