Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular Expressions: Capture multiple groups using quantifier

Consider the following code:

<!DOCTYPE html>
<html>
<body>
<script type="text/javascript">

var str = '<12> rnbqkb-r Rnbq-b-r ';

var pat1 = new RegExp('^\\<12\\> ([rnbqkpRNBQKP-]{8}) ([rnbqkpRNBQKP-]{8})');
var pat2 = new RegExp('^\\<12\\> ([rnbqkp RNBQKP-]{8}){2}');
var pat3 = new RegExp('^\\<12\\> ([rnbqkp RNBQKP-]{8}){2}?');

document.write(str.match(pat1));
document.write('<br />');
document.write(str.match(pat2));
document.write('<br />');
document.write(str.match(pat3));

</script>
</body>
</html>

which produces

<12> rnbqkb-r Rnbq-b-r,rnbqkb-r,Rnbq-b-r
<12> rnbqkb-r Rnbq-b-, Rnbq-b-
<12> rnbqkb-r Rnbq-b-, Rnbq-b-

as output.

Why does neither pattern pat2 nor pat3 capture the first group rnbqkb-r? I would like to capture all groups without having to repeat them explicitly as in pattern pat1.

like image 399
chessweb Avatar asked Aug 14 '12 19:08

chessweb


2 Answers

Why does neither pattern pat2 nor pat3 capture the first group rnbqkb-r?

Because you have white-space at the end of each 8-character sequence that your regexes pat2 and pat3 do not allow.

I would like to capture all groups without having to repeat them explicitly as in pattern pat1.

You can't.

It is not possible (in JavaScript) to capture two groups when your regex only contains one group.

Groups are defined thorugh parentheses. Your match result will contain as many groups as there are parentheses pairs in your regex (except modified parentheses like (?:...) which will not count towards match groups). Want two separate group matches in your match result? Define two separate groups in your regex.

If a group can match multiple times, the group's value will be whatever it matched last. All previous match occurrences for that group will be overridden by its last match.

Try

var pat1 = /^<12> ((?:[rnbqkp-]{8} ?)*)/i,
    match = str.match(pat1);

if (match) {
  match[1].split(/\s+/);  // ["rnbqkb-r", "Rnbq-b-r", ""]
}

Notes:

  • Trim str beforehand if you don't want the last empty array value.
  • In general, prefer regex literal notation (/expression/). Use new RegExp() only for expressions you generate from dynamic values.
  • < and > are not special, you don't need to escape them.
like image 123
Tomalak Avatar answered Nov 02 '22 23:11

Tomalak


Count again (8 vs 9). pat2 and pat3 are missing the space in between the two parts.

Update: Additionally, I don't thing it's possible what you are trying to achieve by using match. See How can I match multiple occurrences with a regex in JavaScript similar to PHP's preg_match_all()? and use exec.

like image 37
Prinzhorn Avatar answered Nov 02 '22 23:11

Prinzhorn