Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do a negative lookbehind within a %r<…>-delimited regexp in Ruby?

I like the %r<…> delimiters because it makes it really easy to spot the beginning and end of the regex, and I don't have to escape any /. But it seems that they have an insurmountable limitation that other delimiters don't have?

Every other delimiter imaginable works fine:

/(?<!foo)/
%r{(?<!foo)}
%r[(?<!foo)]
%r|(?<!foo)|
%r/(?<!foo)/

But when I try to do this:

%r<(?<!foo)>

it gives this syntax error:

unterminated regexp meets end of file

Okay, it probably doesn't like that it's not a balanced pair, but how do you escape it such that it does like it?

Does something need to be escaped?

According to wikibooks.org:

Any single non-alpha-numeric character can be used as the delimiter, %[including these], %?or these?, %~or even these things~. By using this notation, the usual string delimiters " and ' can appear in the string unescaped, but of course the new delimiter you've chosen does need to be escaped.

Indeed, escaping is needed in these examples:

%r!(?<\!foo)!                                                             
%r?(\?<!foo)? 

But if that were the only problem, then I should be able to escape it like this and have it work:

%r<(?\<!foo)>

But that yields this error:

undefined group option: /(?\<!foo)/

So maybe escaping is not needed/allowed? wikibooks.org does list %<pointy brackets> as one of the exceptions:

However, if you use %(parentheses), %[square brackets], %{curly brackets} or %<pointy brackets> as delimiters then those same delimiters can appear unescaped in the string as long as they are in balanced pairs

Is it a problem with balanced pairs?

Balanced pairs are no problem as long as you are doing something in the Regexp that requires them, like...

%r{(?<!foo{1})}   # repetition quantifier
%r[(?<![foo])]    # character class
%r<(?<name>foo)>  # named capture group

But what if you need to insert a left-side delimiter ({, [, or <) inside the regex? Just escape it, right? Ruby seems to have no problem with escaped unbalanced delimiters most of the time...

%r{(?<!foo\{)}                                                                  
%r[(?<!\[foo)]
%r<\<foo>

It's just when you try to do it in the middle of the "group options" (which I guess is what the <! characters are classified as here) following a (? that it doesn't like it:

%r<(?\<!foo)>
# undefined group option: /(?\<!foo)/

So how do you do that then and make Ruby happy? (without changing the delimiters)

Conclusion

The workaround is easy. I'll just change this particular regex to just use something else instead like %r{…} instead.

But the questions remain...

  1. Is there really no way to escape the < here?
  2. Are there really some regular expression that are simply impossible to write using certain delimiters like %r<…>?
  3. Is %r<…> the only regular expression delimiter pair that has this problem (where some regular expressions are impossible to write when using it). If you know of a similar example with %r{…}/%r[…], do share!

Version info

Not that it probably matters since this syntax probably hasn't changed, but I'm using:

⟫ ruby -v
ruby 2.6.0p0 (2018-12-25 revision 66547) [x86_64-linux]

Reference:

  • https://ruby-doc.org/core-2.6.3/Regexp.html
  • % Notation
like image 222
Tyler Rick Avatar asked Apr 25 '19 18:04

Tyler Rick


1 Answers

As others have mentioned, seems like an oversight based on how this character differs from other paired boundaries.

As far as "Is there really no way to escape the < here?" there is a way... but you're not going to like it:

%r<(?#{'<'}!foo)> == %r((?<!foo))

Using interpolation to insert the < character seems to work. But given that there are much better options, I would avoid it unless you were planning on splitting the regex into sections anyway...

like image 130
user208769 Avatar answered Nov 15 '22 14:11

user208769