Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Please can someone help me understand the exec method for regular expressions?

The best place I have found for the exec method is Eloquent Javascript Chapter 9:

"Regular expressions also have an exec (execute) method that will return null if no match was found and return an object with information about the match otherwise. An object returned from exec has an index property that tells us where in the string the successful match begins. Other than that, the object looks like (and in fact is) an array of strings, whose first element is the string that was matched...."

So far this makes sense but then it gets a bit confusing:

"When the regular expression contains subexpressions grouped with parentheses, the text that matched those groups will also show up in the array. The whole match is always the first element."

okay but...

"The next element is the part matched by the first group (the one whose opening parenthesis comes first in the expression), then the second group, and so on."

var quotedText = /'([^']*)'/;
console.log(quotedText.exec("she said 'hello'"));
// → ["'hello'", "hello"]

My confusion is with the repeated hello in this example. I don't understand why it would give me two hellos back?

And then the the topic is wrapped up by the following:

"When a group does not end up being matched at all (for example, when followed by a question mark), its position in the output array will hold undefined. Similarly, when a group is matched multiple times, only the last match ends up in the array."

console.log(/bad(ly)?/.exec("bad"));
// → ["bad", undefined]
console.log(/(\d)+/.exec("123"));
// → ["123", "3"]

This last sentence and example keep me confused....

Any light shed on this would be much appreciated!

like image 937
Anna Avatar asked Mar 15 '23 16:03

Anna


1 Answers

I don't understand why it would give me two hellos back?

Because the first entry in the array is the overall match for the expression, which is then followed by the content of any capture groups the expression defines. Since the expression defines one capture group, you get back two entries. The overall match is 'hello' (with the single quotes), and the capture group is hello (without them), because in the regular expression, only the hello is in the capture group (the parentheses), while the ' are outside it:

 vvvvvvvvv----- Overall expression
/'([^']*)'/
  ^^^^^^^------ Capture group

Let's look at that /bad(ly)?/ example: What it says is "match bad optionally followed by ly, capturing the ly if it's there." So you get:

console.log(/bad(ly)?/.exec("bad"));
// -> ["bad", undefined]
//     ^      ^
//     |      +--- first capture group has nothing in it
//     +---------- overall match is "bad"
console.log(/bad(ly)?/.exec("badly"));
// -> ["badly", "ly"]
//     ^        ^
//     |        +- first capture group has "ly"
//     +---------- overall match is "badly"

Suppose we put the l and y in individual capture groups, and make both of them optional:

console.log(/bad(l)?(y)?/.exec("bad"));
// -> ["bad", undefined, undefined]
//     ^      ^          ^
//     |      |          +--- Nothing in the second capture group
//     |      +-------------- Nothing in the first capture group
//     +--------------------- Overall match is "bad"
console.log(/bad(l)?(y)?/.exec("badly"));
// -> ["badly", "l", "y"]
//     ^        ^    ^
//     |        |    +------- Second capture group has "y"
//     |        +------------ First capture group has "l"
//     +--------------------- Overall match is "badly"
console.log(/bad(l)?(y)?/.exec("badl"));
// -> ["badl", "l", undefined]
//     ^       ^    ^
//     |       |    +-------- Second capture group has nothing in it
//     |       +------------- First capture group has "l"
//     +--------------------- Overall match is "badl"
console.log(/bad(l)?(y)?/.exec("bady"));
// -> ["bady", undefined, "y"]
//     ^       ^          ^
//     |       |          +-- Second capture group has "y"
//     |       +------------- First capture group has nothing in it
//     +--------------------- Overall match is "bady"
like image 165
T.J. Crowder Avatar answered Apr 25 '23 21:04

T.J. Crowder