Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does "(?x::" mean in a Boost regex replacement, where "x" is a number?

This is a Perl-style Regular Expression found in one of the snippet files in the Ruby package of Sublime Text 2:

/(?:\A|_)([A-Za-z0-9]+)(?:\.rb)?/(?2::\u$1)/g

I know it converts a file name like "some_class.rb" into "SomeClass", but I can't figure out what this part does: (?2::. Sublime Text 2 uses boost for its Regular Expressions, so I checked the documentation for Boost-Extended Format String Syntax, and I found that boost supports conditionals in the format string (f.inst. (?2(foo):(bar))) but you'll never need two colons for that. Additionally, ?2 would point to the second sub-expression, but the expression above only ever matches one sub-expression. For those reasons, I don't think this is a conditional expression.

Thanks for any enlightening answers.

like image 260
Hubro Avatar asked Jun 26 '13 13:06

Hubro


2 Answers

First of all, the (?2::\u$1) on the replacement side of ///g is not Perl. It's Boost's own extension

  • see [Search and Replace Format String Syntax > Boost-Extended Format String Syntax] as mentioned by OP.

Referring to said document, it is saith,

The character '?' begins a conditional expression, the general form is:

?Ntrue-expression:false-expression

where N is decimal digit.

If sub-expression N was matched, then true-expression is evaluated and sent to output, otherwise false-expression is evaluated and sent to output.

Based on this, let's analyze the mysterious (?2::\u$1)

  1. ?2 is always false, because there is no 2nd capturing group.
  2. the first : is a "special character" in true-expression which means empty string
    • if we assume that true-expression cannot be left empty, the first : is not interpreted as the separator between the true- and false-expression (for the gory/juicy details, please read Appendix D).
    • in fact, we can put whatever we want as true-expression (as long as there is no : somewhere in the middle) since ?2 never evaluates to true.
  3. \u$1 is the false-expression.

Putting two and two together, I'm going to go out on a limb and say

/(?:\A|_)([A-Za-z0-9]+)(?:\.rb)?/(?2::\u$1)/g

is but an obfuscated way of doing this:

/(?:\A|_)([A-Za-z0-9]+)(?:\.rb)?/\u$1/g

Appendix D: Experimentation with snippet in Sublime Text 2

So I defined a Sublime Text 2 snippet with this content

<snippet>
    <content><![CDATA[
snakecase: ${1:hello_world}
camelcase: ${1/(?:\A|_)([A-Za-z0-9]+)(?:\.rb)?/(?2::\u$1)/g}
]]></content>
    <tabTrigger>convert</tabTrigger>
</snippet>

and played around with different expressions for the right side of substitution.

Given the input hello_world

  1. if right side is (?2::\u$1), returns HelloWorld
  2. if right side is (?2:\u$1), returns HW
  3. if right side is (?2:$1), returns nothing
  4. if right side is (?2:::\u$1), returns :Hello:World
  5. if right side is (?1:\u$1), returns HelloWorld
  6. if right side is (?1::\u$1), returns HW
  7. if right side is (?1::$1), returns nothing
  8. if right side is (?1:::\u$1) returns HW
  9. if right side is (?1:::$1) returns nothing
  10. if right side is \u$1, returns HelloWorld

Some tentative conclusions based on this (assuming that cases 2, 6, 8 are anomalies)

  • if only a single colon (:) follows the digit, it is ignored (i.e. it is not interpreted as the separator between true- and false-expression).
  • if 2 colons (::) follow the digit, the true-expression is an empty string (2nd : is the separator)
  • if 3 colons (:::) follow the digit, the true-expression is an empty string, and the false-expression starts with a literal colon (2nd : is the separator)
  • comparing cases 1 and 10, my conclusion on the equivalence of (?2::\u$1) and \u$1 still stands.

I said anomalies because \u$1 behaves so differently compared to $1 (everything except the first character of captured substring disappears)

like image 178
doubleDown Avatar answered Sep 27 '22 15:09

doubleDown


Maybe there's a colon in the replacement string, like :\u$1 is the replacement string.

like image 34
shawnhcorey Avatar answered Sep 27 '22 15:09

shawnhcorey