Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex in Javascript not as greedy as it should?

I have made a simple code for capturing a certain group in a string :

/[a-z]+([0-9]+)[a-z]+/gi    (n chars , m digts , k chars).

code :

var myString='aaa111bbb222ccc333ddd';
var myRegexp=/[a-z]+([0-9]+)[a-z]+/gi;

var match=myRegexp.exec(myString);
console.log(match)
 
 while (match != null)
{
  match = myRegexp.exec(myString);
  console.log(match)
}

The result were :

["aaa111bbb", "111"]
["ccc333ddd", "333"]
null

But wait a minute , Why he didnt try the bbb222ccc part ?

I mean , It saw the aaa111bbb but then he should have try the bbb222ccc... ( That's greedy !)

What am I missing ?

Also

looking at

   while (match != null)
    {
      match = myRegexp.exec(myString);
      console.log(match)
    }

how did it progressed to the second result ? at first there was :

var match=myRegexp.exec(myString);

later ( in a while loop)

match=myRegexp.exec(myString);
match=myRegexp.exec(myString);

it is the same line ... where does it remember that the first result was already shown ?

like image 659
Royi Namir Avatar asked Nov 30 '12 14:11

Royi Namir


People also ask

How do I make a regex not greedy?

backing up until it can match an 'ab' (this is called backtracking). To make the quantifier non-greedy you simply follow it with a '?' the first 3 characters and then the following 'ab' is matched.

Is regex matching greedy?

The standard quantifiers in regular expressions are greedy, meaning they match as much as they can, only giving back as necessary to match the remainder of the regex. By using a lazy quantifier, the expression tries the minimal match first.

Is JavaScript regex fast?

Introduction to Regular ExpressionsA regular expression (also called regex for short) is a fast way to work with strings of text. By formulating a regular expression with a special syntax, you can: search for text in a string.

Why is regex so difficult?

Regular expressions are dense. This makes them hard to read, but not in proportion to the information they carry. Certainly 100 characters of regular expression syntax is harder to read than 100 consecutive characters of ordinary prose or 100 characters of C code.


2 Answers

.exec is stateful when you use the g flag. The state is kept in the regex object's .lastIndex property.

var myString = 'aaa111bbb222ccc333ddd';
var myRegexp = /[a-z]+([0-9]+)[a-z]+/gi;
var match = myRegexp.exec(myString);
console.log(myRegexp.lastIndex); //9, so the next `.exec` will only look after index 9
while (match != null) {
    match = myRegexp.exec(myString);
    console.log(myRegexp.lastIndex);
}

The state can be resetted by setting .lastIndex to 0 or by execing a different string. re.exec("") for instance will reset the state because the state was kept for 'aaa111bbb222ccc333ddd'.

The same applies to .test method as well, so never use g flag with a regex that is used for .test if you prefer no surprises. See https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Global_Objects/RegExp/exec

like image 101
Esailija Avatar answered Oct 05 '22 05:10

Esailija


You can also update manually the lastIndex property :

var myString='aaa111bbb222ccc333ddd';
var myRegexp=/[a-z]+([0-9]+)[a-z]+/gi;

var match=myRegexp.exec(myString);
console.log(match);

 while (match != null)
{
  myRegexp.lastIndex -= match[0].length - 1; // Set the cursor to the position just after the beginning of the previous match
  match = myRegexp.exec(myString);
  console.log(match)
}

See this link MDN exec.


EDIT :

By the way your regex should be : /[a-z]{3}([0-9]{3})[a-z]{3}/gi

like image 23
Samuel Caillerie Avatar answered Oct 05 '22 06:10

Samuel Caillerie