Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why capturing group results in double matches regex

Consider these two scripts:

1st: " ".match(/(\s)/)

and

2nd: " ".match(/\s/)

Results

1st: [" "," "]

2nd: [" "]

I don't understand this behavior. As far as i knew purpose of capturing group/paranthesis was to have a section of match to be refered again later within regex. But clearly that's not all. Or is this behavior specific to match and split methods.

like image 928
Techsin Avatar asked Sep 02 '13 16:09

Techsin


2 Answers

Capturing groups serve two purposes. They can be referred later in the regexp (or in the replacement string when using .replace()), but they are also returned by the matching function so they can be used by the caller. This is why .match() returns an array: result[0] is the match for the whole regexp, result[n] is the match for the nth capture group.

string.split splices the matches for capture groups into the resulting array. The documentation says:

If separator is a regular expression that contains capturing parentheses, then each time separator is matched the results (including any undefined results) of the capturing parentheses are spliced into the output array. However, not all browsers support this capability.

like image 27
Barmar Avatar answered Sep 28 '22 17:09

Barmar


First script: The first result is the whole pattern, the second is the capturing group

Second script: the only result is the whole pattern.

Capturing groups are not only to refer later in the pattern, they are displayed in results too.

When you use a capturing group with split, the capturing group is returned with results and since the separator is supposed to slice the string, it is normal that you obtain ["", " ", ""] as result with
" " as input string and /(\s)/ as pattern.

More informations about split.

When you write " ".match(/(\s)/) the result returned is the first match. This result is unique and contains:

  • the whole match
  • capturing group(s)
  • index of the match
  • input string

When you write " ".match(/(\s)/g) the result returned is all the matches:

  • whole match 1
  • whole match 2
  • etc.

(in the present case you have only one match)

This behaviour is normal. The match method as two different behaviours (with or without /g). It is a kind of two functions in one. For comparison in PHP (or other languages) which doesn't have the g modifier, you have two different functions: preg_match and preg_match_all

like image 128
Casimir et Hippolyte Avatar answered Sep 28 '22 16:09

Casimir et Hippolyte