Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why doesn’t the alternation (pipe) operator ( | ) in JavaScript regular expressions give me two matches?

Here is my regular expression:

"button:not([DISABLED])".match(/\([^()]+\)|[^()]+/g); 

The result is:

["button:not", "([DISABLED])"] 

Is it correct? I'm confused. Because the (pipe) operator | means "or", I think the correct result is:

["button:not", "[DISABLED]", "([DISABLED])"]  

Because this:

["button:not", "[DISABLED]"] 

is the result of:

"button:not([DISABLED])".match(/[^()]+/g); 

and this:

["([DISABLED])"] 

is the result of:

"button:not([DISABLED])".match(/\([^()]+\)/g); 

But the result output in console tell me the result is:

["button:not", "([DISABLED])"] 

Where is the problem?

like image 354
user2155362 Avatar asked Jun 29 '13 09:06

user2155362


2 Answers

The regex

/\([^()]+\)|[^()]+/g 

Basically says: There are two options, match (1) \([^()]+\) OR (2) [^()]+, wherever you see any of them (/g).

Let's iterate at your sample string so you understand the reason behind the obtained result.

Starting string:

button:not([DISABLED]) 

Steps:

  • The cursor begins at the char b (actually it begins at the start-of-string anchor, ^, but for this example it is irrelevant).
  • Between the two available options, b can only match the (2), as the (1) requires a starting (.
    • Now that it has begun to match the (2), it will keep on matching it all the way, meaning it will consume everything that's not a ( or ).
    • From the item above, it consumes everything up to (and including) the t char (because the next char is a ( which does not match [^()]+) thus leaving button:not as first matched string.
  • (room for clarity)
  • Now the cursor is at (. Does it begin to match any of the options? Yes, the first one: \([^()]+\).
    • Again, now that it has begun to match the (1), it will go through it all the way, meaning it will consume everything that's not a ( or ) until it finds a ) (if while consuming it finds a ( before a ), it will backtrack as that will mean the (1) regex was ultimately not matched).
    • Now it keeps consuming all the remaining characters until it finds ), leaving then ([DISABLED]) as second matched string.
  • (room for clarity)
  • Since we have reached the last character, the regex processing ends.



Edit: There's a very useful online tool that allows you to see the regex in a graphical form. Maybe it helps to understand how the regex will work:

Regular expression image

You can also move the cursor step by step and see what I tried to explain above: live link.

Note about the precedence of expressions separed by |: Due to the way the JavaScript regex engine process the strings, the order in which the expressions appear matter. It will evaluate each alternative in the order they are given. If one is those options is matched to the end, it will not attempt to match any other option, even if it could. Hopefully an example makes it clearer:

"aaa".match(/a|aa|aaa/g); // ==> ["a", "a", "a"] "aaa".match(/aa|aaa|a/g); // ==> ["aa", "a"] "aaa".match(/aaa|a|aa/g); // ==> ["aaa"] 
like image 88
acdcjunior Avatar answered Oct 17 '22 17:10

acdcjunior


Your understanding of the alternation operator seems to be incorrect. It does not look for all possible matches, only for the first one that matches (from left to right).

Consider (a | b) as "match either a or b".

See also: http://www.regular-expressions.info/alternation.html

like image 41
Felix Kling Avatar answered Oct 17 '22 18:10

Felix Kling