Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why not create a backreference?

I understand that putting ?: inside of the start of the parentheses of a regular expression will prevent it from creating a backreference, which is supposed to be faster. My question is, why do this? Is the speed increase noticeable enough to warrant this consideration? Under what circumstances is it going to matter so much that you need to carefully skip the backreference each time you are not going to use it. Another disadvantage is that it makes the regex harder to read, edit, and update (if you end up wanting to use a backreference later).

So in summary, why bother not creating a backreference?

like image 439
Explosion Pills Avatar asked Mar 14 '11 01:03

Explosion Pills


1 Answers

I think you're confusing backreferences like \1 and capturing groups (...).

Backreferences prevent all kinds of optimizations by making the language non-regular.

Capturing groups make the regular expression engine do a little more work to remember where a group starts and ends, but are not as bad as backreferences.

http://www.regular-expressions.info/brackets.html explains capturing groups and back references to them in detail.

EDIT:

On backreferences making regular expressions non-regular, consider the following regular expression which matches lua comments:

/^--(?:\[(=*)\[[\s\S]*?(?:\]\1\]|$)|[^\r\n]*)/

So --[[...]] is a comment, --[=[...]=] is a comment, --[==[...]==] is a comment. You can nest comments by adding extra equals signs between the square brackets.

This cannot be matched by a strictly regular language, so a simple finite state machine cannot handle it in O(n) time -- you need a counter.

Perl 5 regular expressions can handle this using back-references. But as soon as you require non-regular pattern matching, your regular expression library has to give up the simple state-machine approach and use more complex, less-efficient code.

like image 143
Mike Samuel Avatar answered Sep 19 '22 17:09

Mike Samuel