Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I would like to mimick conditionals in javascript regex

This is what I have so far...

var regex_string = "s(at)?u(?(1)r|n)day"
console.log("Before: "+regex_string)
var regex_string = regex_string.replace(/\(\?\((\d)\)(.+?\|)(.+?)\)/g,'((?!\\$1)$2\\$1$3)')
console.log("After: "+regex_string)
var rex = new RegExp(regex_string)

var arr = "thursday tuesday thuesday tursday saturday sunday surday satunday monday".split(" ")
for(i in arr){
  var m
  if(m = arr[i].match(rex)){
    console.log(m[0])
  }
}

I am swapping (?(n)a|b) for ((?!\n)a|\nb) where n is a number, and a and b are strings. This seems to work fine - however, I am aware that it is a big fat hack.

Is there a better way to approach this problem?

like image 278
Billy Moon Avatar asked Feb 22 '13 11:02

Billy Moon


1 Answers

In the specific case of your regex, it is much simpler and more readable to use alternation:

(?:sunday|saturday)

Or you can create alternation only between the 2 positions where the conditional regex is involved (this is more useful in the case where there are many such conditional expressions, but only refers to the nearby capturing group). Using your case as an example, we will only create the alternation for un and atur since only those are involved in the condition:

s(?:un|atur)day

There are 2 common types of conditional regex. (There are more exotic stuffs supported by Perl regular expression, but those requires support for features that JavaScript regular expression or other common regex engine doesn't have).

  1. The first type is where an explicit pattern is provided as condition. This type can be mimicked in JavaScript regex. In the language that supports conditional regex, the pattern will be:

    (?(conditional-pattern)yes-pattern|no-pattern)
    

    In JavaScript, you can mimic it with look-ahead, with the (obvious) assumption that the original conditional-pattern is a look-ahead:

    ((?=conditional-pattern)yes-pattern|(?!conditional-pattern)no-pattern)
    

    The negative look-ahead is necessary, to prevent the cases where the input string passes the conditional-pattern and fail in the yes-pattern, but it can match the no-pattern. It is safe to do so, because positive look-around and negative look-around are exact opposite of each other logically.

  2. The second type is where a reference to a capturing group is provided (name or number), and the condition will be evaluated to true when the capturing group has a match. In such case, there is no simple solution.

    The only way I can think of is by duplication, as what I have done with your case as an example. This of course reduces the maintainability. It is possible to compose you regex by writing them in parts (in literal RegExp), retrieve the string with source attribute, then concatenate them together; this will allow for changes to propagate to other duplicated parts, but makes it harder to understand the regex and/or make major modification to it.

References

  • Alternation Constructs in Regular Expression - .NET - Microsoft
  • re package in Python: Ctrl+F for (?(
  • perlre - Perl regular expression: Ctrl+F for (?(
like image 146
nhahtdh Avatar answered Sep 28 '22 15:09

nhahtdh