Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Consecutive calls to RegExp test fail for pattern with global option [duplicate]

I've been wrestling with this all day, and I can't figure out if I'm doing something wrong or if I've found a bug in Chrome's JavaScript engine. It appears that consecutive calls to a RegExp object with the global flag returns inconsistent results for the same input string. I'm testing with the following function:

function testRegex(pattern, array) {
    document.writeln('Pattern = ' + pattern + ', Array = ' + array + '<br/>');
    for (var ii = 0; ii < array.length; ii++) {
        document.writeln(ii + ', ');
        document.writeln(array[ii] + ', ');
        document.writeln(pattern.test(array[ii]) + '<br />');
    }
    document.writeln('<br/>');
}

When I call the function with /a/g as the pattern and various arrays of strings, I get the following results, many of which are incorrect as far as I can tell:

// EXPECTED: True
// ACTUAL:   True
testRegex(/a/g, ['a']);

// EXPECTED: True,  True
// ACTUAL:   True,  False 
testRegex(/a/g, ['a', 'a']);

// EXPECTED: True, True,  True
// ACTUAL:   True, False, True
testRegex(/a/g, ['a', 'a', 'a']);

// EXPECTED: True, False, True
// ACTUAL:   True, False, True
testRegex(/a/g, ['a', 'b', 'a']);

// EXPECTED: True, True,  True, True
// ACTUAL:   True, False, True, False
testRegex(/a/g, ['a', 'a', 'a', 'a']);

// EXPECTED: True, False, False, True
// ACTUAL:   True, False, False, True   
testRegex(/a/g, ['a', 'b', 'b', 'a']);

When I call the same function with the same arrays of strings, but pass /a/ as the pattern, the actual results all match the expected results.

// EXPECTED: True
// ACTUAL:   True
testRegex(/a/, ['a']);

// EXPECTED: True, True
// ACTUAL:   True, True
testRegex(/a/, ['a', 'a']);

// EXPECTED: True, True, True
// ACTUAL:   True, True, True
testRegex(/a/, ['a', 'a', 'a']);

// EXPECTED: True, False, True
// ACTUAL:   True, False, True
testRegex(/a/, ['a', 'b', 'a']);

// EXPECTED: True, True, True, True
// ACTUAL:   True, True, True, True
testRegex(/a/, ['a', 'a', 'a', 'a']);

// EXPECTED: True, False, False, True
// ACTUAL:   True, False, False, True
testRegex(/a/, ['a', 'b', 'b', 'a']);

I've created a working example of the code above: http://jsfiddle.net/FishBasketGordo/gBWsN/

Am I missing something? Shouldn't the results be the same for the given arrays of strings no matter if the pattern is global or not? Note, I've primarily been working in Chrome, but I've observed similar incorrect results in Firefox 4 and IE 8.

like image 428
FishBasketGordo Avatar asked Jul 18 '11 20:07

FishBasketGordo


People also ask

What is global match in regex?

Definition and Usage A global match finds all matches (compared to only the first).

What does (? I do in regex?

(? i) makes the regex case insensitive. (? c) makes the regex case sensitive.

Why * is used in regex?

- a "dot" indicates any character. * - means "0 or more instances of the preceding regex token"

Which is faster RegExp match or RegExp test?

Use . test if you want a faster boolean check. Use . match to retrieve all matches when using the g global flag.


2 Answers

If you change your test loop as follows:

for (var ii = 0; ii < array.length; ii++) {
    document.writeln(ii + ', ');
    document.writeln(array[ii] + ', ');
    document.writeln(pattern.test(array[ii]) + '<br />');
    pattern.lastIndex = 0;
}

Then your code will work. The problem is that the "g" flag is causing the RegExp object to get stuck. The "lastIndex" value is set to 1 after the first iteration of that loop, because of the "g". If you don't set it back to reset the search, then it assumes that on the second call you're ask it to keep going from offset 1.

Using the "g" flag on a regular expression outside the context of a ".replace()" call has odd semantic implications anyway.

like image 124
Pointy Avatar answered Oct 23 '22 18:10

Pointy


It is not a bug, but a feature. The results you get are not "incorrect", only unexpected.

10.3.2. RegExp Instance Properties

Each RegExp object has five properties. The source property is a read-only string that contains the text of the regular expression. The global property is a read-only boolean value that specifies whether the regular expression has the g flag. The ignoreCase property is a read-only boolean value that specifies whether the regular expression has the i flag. The multiline property is a read-only boolean value that specifies whether the regular expression has the m flag. The final property is lastIndex, a read-write integer. For patterns with the g flag, this property stores the position in the string at which the next search is to begin. It is used by the exec( ) and test( ) methods, as described in the previous section.

Source

like image 22
Hyperboreus Avatar answered Oct 23 '22 18:10

Hyperboreus