Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is this regex matching also words within a non-capturing group?

I have this string (notice the multi-line syntax):

var str = `   Number One: Get this
    Number Two: And this`;

And I want a regex that returns (with match):

[str, 'Get this', 'And this']

So I tried str.match(/Number (?:One|Two): (.*)/g);, but that's returning:

["Number One: Get this", "Number Two: And this"]

There can be any whitespace/line-breaks before any "Number" word.

Why doesn't it return only what is inside of the capturing group? Am I misundersating something? And how can I achieve the desired result?

like image 794
Jacob Avatar asked May 26 '15 01:05

Jacob


People also ask

What is the point of non-capturing group in regex?

They can help you to extract exact information from a bigger match (which can also be named), they let you rematch a previous matched group, and can be used for substitutions.

What is non-capturing group in regex Python?

This syntax captures whatever match X inside the match so that you can access it via the group() method of the Match object. Sometimes, you may want to create a group but don't want to capture it in the groups of the match. To do that, you can use a non-capturing group with the following syntax: (?:X)

Why use a non-capturing group?

A non-capturing group lets us use the grouping inside a regular expression without changing the numbers assigned to the back references (explained in the next section). This can be very useful in building large and complex regular expressions.

What does capture mean in regex?

capturing in regexps means indicating that you're interested not only in matching (which is finding strings of characters that match your regular expression), but you're also interested in using specific parts of the matched string later on.


2 Answers

Per the MDN documentation for String.match:

If the regular expression includes the g flag, the method returns an Array containing all matched substrings rather than match objects. Captured groups are not returned. If there were no matches, the method returns null.

(emphasis mine).

So, what you want is not possible.

The same page adds:

  • if you want to obtain capture groups and the global flag is set, you need to use RegExp.exec() instead.

so if you're willing to give on using match, you can write your own function that repeatedly applies the regex, gets the captured substrings, and builds an array.


Or, for your specific case, you could write something like this:

var these = str.split(/(?:^|\n)\s*Number (?:One|Two): /);
these[0] = str;
like image 165
ruakh Avatar answered Oct 11 '22 10:10

ruakh


Replace and store the result in a new string, like this:

var str = `   Number One: Get this
Number Two: And this`;
var output = str.replace(/Number (?:One|Two): (.*)/g, "$1");
console.log(output);

which outputs:

Get this
And this

If you want the match array like you requested, you can try this:

var getMatch = function(string, split, regex) {
    var match = string.replace(regex, "$1" + split);
    match = match.split(split);
    match = match.reverse();
    match.push(string);
    match = match.reverse();
    match.pop();
    return match;
}

var str = `   Number One: Get this
Number Two: And this`;
var regex = /Number (?:One|Two): (.*)/g;
var match = getMatch(str, "#!SPLIT!#", regex);
console.log(match);

which displays the array as desired:

[ '   Number One: Get this\n    Number Two: And this',
'   Get this',
'\n    And this' ]

Where split (here #!SPLIT!#) should be a unique string to split the matches. Note that this only works for single groups. For multi groups add a variable indicating the number of groups and add a for loop constructing "$1 $2 $3 $4 ..." + split.

like image 22
ShellFish Avatar answered Oct 11 '22 11:10

ShellFish