Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to match balanced delimiters in javascript regex?

I would have thought this problem to be impossible; as far as I know, Javascript's regex flavor has niether recursive interpolation nor the nifty .NET balancing groups feature. Yet there it is, as problem 12 on regex.alf.nu: match balanced pairs of < and >. Unless there's some other pattern in the sets I'm not getting.

So... is this possible? If so, how?

NOTES:

  1. I know that this is impossible for true regular expressions, but based on the challenge it seems that it must be possible in Javascript's flavor (which is at least irregular enough to have backreferences). I just don't know of any feature that would let them do this.

  2. No other code - the form allows entry of a single regex, which is evaluated against the test strings on the page. I could try to crack the page to break out of the regex and into raw JS, I suppose, but that doesn't seem to be in the spirit of this challenge.

Since David asked, here are the test strings. Longer ones have been truncated with a character count, but the problem is entitled "Balance" and the ones that are complete certainly support the hypothesis that the "match" column has balanced pairs of < and > while the "not" column doesn't.

Match all of these…

<<<<<>><<>>><<... [62 chars]
<<<<<>><>><<><... [110 chars]
<<<<<>><>><>><... [102 chars]
<<<<<>><>>>><<... [88 chars]
<<<<<>>><<<>><... [58 chars]
<<<<<>>><<><>>... [152 chars]
<<<<<>>><<>><<... [42 chars]
<<<<<>>><>><<<>>>><<>>
<<<<<>>>><<<<>... [102 chars]
<<<<<>>>><<<><... [30 chars]
<<<<<>>>><><<<... [66 chars]
<<<<<>>>><><<<... [124 chars]
<<<<<>>>><>><<>>
<<<<><<>>><<<>... [34 chars]
<<<<>><<<>>>><... [92 chars]
<<<<>>><<<<>><>><<<>>>>>
<<<<>>><<<><<>>><><<>>>><<>>
<<<<>>><<><<<>... [84 chars]
<<<<>>>><<<><<... [52 chars]
<<<><<<>>>><<<... [50 chars]
<<<><<><>>>>
<<<><>><<<>>>>
<<<>><<<><<>>>... [44 chars]
<<<>><><<<><>>... [48 chars]
<<<>>><<><<<<>>>><<><<<>>>>>
<<><<<<>><>>>>... [60 chars]
<<>>
<<>><<<<<>>>>>... [54 chars]
<<>><<<<>><<<>... [74 chars]
<>
<><>

and none of these…

<
<<<<<<>>><<><>>>>>><<>
<<<<<>>><>>><<<>>>><>>
<<<<<>>>>>>
<<<<>><<<<<><<>><><<<<
<<<>><<<<><><><><
<<<>>>><><<<><>
<<><<<<><<><<>>><<
<<><<<>>>>><<
<<>>><<<>>
<><<<>><<>>><<>
<><<>>><<<><>><<<>>><<>>>><
<><<>>><><<<>
<><>><>>><><<<... [36 chars]
<>><><<<><>
<>>>>>><<<>><<>><><
<>>>>>>><<<
>
><
><<<>><><<<><<
><<<>>>><><<<<><>>><<><><<
><<><<<<><<<<>>>><
><><><<<>>>>>
><><>>><>><>
><><>>>><>>>>>>><>>><>>
><>><<<<<>>
><>><><><<>><<>>><<
><>>><>>>>><<><<<><>><>><<<
>><<<><<<<<<><>><<
>><>>><<<><>>><><<>><<><><<
>>>><>><>>>><>>><>><><
>>>>><<<>>>
like image 749
Mark Reed Avatar asked Dec 22 '13 06:12

Mark Reed


1 Answers

I do not believe that this is possible in JavaScript, though it's hard to prove. For example, Java and PHP do not have the features that you mention (recursive interpolation, balancing groups), but this fascinating Stack Overflow answer shows how to match anbn using regexes in those languages. (Adapting that answer to the present case, the Java regex ^(?:(?:<(?=<*(\1?+>)))+\1)*$ should work. Correction: no, it's not that easily adapted.) But that answer depends on Java's support for the possessive quantifier ?+ (like ? except that you can't backtrack into it), and JavaScript doesn't have that.

That said, you can solve the referenced puzzle by writing this:

^(?:<(?:<(?:<(?:<(?:<(?:<(?:<>)*>)*>)*>)*>)*>)*>)*$

which matches up to seven levels of nesting. That's the most that any of the strings has, so it's all you need. (Several of the other puzzles at that page advise you to cheat because they're asking for something technically impossible; so while an elegant solution would obviously be more appealing, there's no reason to assume that one exists.)

like image 53
ruakh Avatar answered Oct 15 '22 19:10

ruakh