Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JavaScript regex exec() returns match repeated in a list, why?

The following is a regex that picks out pertinent tokens to construct an s-expression from a JS string. It is followed by an enormous block comment that documents how it is built up to do this. I included it because I am new to regex, and maybe I am not understanding one of these points. What I don't understand is why each match regex.exec() returns should be the same match repeated twice and grouped as a list?

var tx = /\s*(\(|\)|[^\s()]+|$)/g; // Create a regular expression
/*       /1 234  5  6      7   /global search
        1. \s      : whitespace metacharacter
        2. n*      : matches any string that contains zero or more 
                     occurrences of n
        3. (a|b|c) : find any of the alternatives specified
        4. \(      : escaped open paren, match "(" (since parens are reserved 
                     characters in regex)
        5. \)      : escaped close paren, match ")"
        6. [^abc]  : find any character not between the brackets
        7. n+      : matches any string that contains at least one n
RESULT - Find matches that have zero or more leading whitespace characters (1+2) 
that are one of the following (3): open paren (4) -OR- close paren (5)
-OR- any match that is at least one non-whitespace, non-paren character (6+7) 
-OR- $, searching globally to find all matches */

var textExpression = "(1 2 3)";
var execSample;
for(var i =0; i < textExpression.length; i++){
    execSample = tx.exec(textExpression)
    display( execSample );
}

Here is what is printed:

(,(
1,1
 2,2
 3,3
),)
,
null

Why are the matches repeated as lists?

like image 200
SquareCrow Avatar asked Oct 28 '25 08:10

SquareCrow


2 Answers

You're NOT getting exactly same items in the printed list.

  • 1st one is having same spaces, representing $0
  • 2nd one is text without spaces, representing $1

If you change your regex to this:

var tx = /\s*(?:\(|\)|[^\s()]+|$)/g;

Then you will get single item in the printed list.

like image 128
anubhava Avatar answered Oct 31 '25 07:10

anubhava


It's because you've got that parenthesized group in your regular expression. The .exec() function returns an array. In the array, the first element (element 0) will contain the entire match, and then the subsequent elements contain the matched groups.

If you don't want that, you can use a non-capturing group:

var tx = /\s*(?:\(|\)|[^\s()]+|$)/g; 
like image 37
Pointy Avatar answered Oct 31 '25 06:10

Pointy