Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Javascript Regex Match Capture is returning whole match, not group

Tags:

re = /\s{1,}(male)\.$/gi  "A girl is a female, and a boy is a male.".match(re); 

this results in " male."

what i want is "male"

I put male in parenthesis and I though that would capture just that group.

Thanks for the help

like image 289
james Avatar asked Mar 10 '11 19:03

james


People also ask

What is regex capture group?

Advertisements. Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d", "o", and "g".

What is capturing group in regex Javascript?

Groups group multiple patterns as a whole, and capturing groups provide extra submatch information when using a regular expression pattern to match against a string. Backreferences refer to a previously captured group in the same regular expression.

What is non-capturing group in regex?

tl;dr non-capturing groups, as the name suggests are the parts of the regex that you do not want to be included in the match and ?: is a way to define a group as being non-capturing. Let's say you have an email address [email protected] . The following regex will create two groups, the id part and @example.com part.

Which operator is required to group in regex?

The Concatenation Operator This operator concatenates two regular expressions a and b . No character represents this operator; you simply put b after a . The result is a regular expression that will match a string if a matches its first part and b matches the rest.


1 Answers

I know that this question is very old but all the answers here are just plain wrong. What really bugs me is that the answers don't add anything useful to the community.

First

Question: Why does the regex result in " male."?

re = /\s{1,}(male)\.$/gi  "A girl is a female, and a boy is a male.".match(re); 

Answer: Because, " male." is the only match.

Question: Why didn't (male) get returned?

Answer: Because captured groups are not returned by match() when the g flag is used.

From the dcoumentation:

If the regular expression includes the g flag, the method returns an Array containing all matched substrings rather than match objects. Captured groups are not returned. If there were no matches, the method returns null.

Second

Let's break down the regex and figure out what pattern it's really matching.

patterns

  • \s{1,} means match at least one white-space. This is the same as \s+.
  • (male) means match male and capture it.
  • \.$ means match a period at the end of the input.

flags

  • g means find all matches rather than stopping after the first match
  • i means ignore case

However, all of those patterns are stuck together. Those patterns do not stand by themselves.

What the regex is matching is: one space followed by "male" followed by a . at the end of the input. In the example the only portion of the input that matches is " male.".

Third

So, what happens when we remove the g flag?

If the string matches the expression, it will return an Array containing the entire matched string as the first element, followed by any results captured in parentheses. If there were no matches, null is returned.

If the regular expression does not include the g flag, str.match() will return the same result as RegExp.exec(). The returned Array has an extra input property, which contains the original string that was parsed. In addition, it has an index property, which represents the zero-based index of the match in the string.

re = /\s{1,}(male)\.$/i  "A girl is a female, and a boy is a male.".match(re); 

The new result is an array with some extra properties: index and input.

res: Array(2)     0 : " male."     1 : "male"     groups : undefined     index : 34     input : "A girl is a female, and a boy is a male."     length : 2 

It's easy to manipulate that result to get what you wanted. However ....

Fourth

I really, really, really wanted the regex to only return "male". Guess what, you can really, really, really do that with pure regex.

re = /male(?=\.$)(?!=[^\b])/gi   "A girl is a female, and a boy is a male.".match(re); 

This results in "male"; exactly what the questioner asked for.

Notice that the g flag is back? It makes no difference in this example, but it will later.

Let's break it down:

  • male matches male; duh.
  • (?=\.$) means match the previous pattern only if it's followed by a . at the end of the input.
  • (?!=[^\b]) means match the previous pattern if it's preceded by a white-space character.

Put it all together and male(?=\.$)(?!=[^\b]) means match male if it's followed by a period at the end of the input and match male if it's preceded by a white-space character.

FINALLY

What about that g flag? Can we see it do something?

As previous user's said, the \.$ makes the g flag irrelevant because there can only be one end of input character; irrelevant for matching that is because we see that it affects the output of macth().

What if we changed the input to A girl is a female, and a boy is a male. A female likes a good male.

Get rid of the $ and see the g flag work it's magic.

re = /male(?=\.)(?!=[^\b])/ig  res = "A girl is a female, and a boy is a male. A female likes a good male.".match(re); 

Now, the output is an array with just matches! ['male','male'].

I feel better now.

like image 102
shrewmouse Avatar answered Oct 01 '22 02:10

shrewmouse