Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JavaScript regular expressions and sub-matches

Why do Javascript sub-matches stop working when the g modifier is set?

var text = 'test test test test';  var result = text.match(/t(e)(s)t/); // Result: ["test", "e", "s"] 

The above works fine, result[1] is "e" and result[2] is "s".

var result = text.match(/t(e)(s)t/g); // Result: ["test", "test", "test", "test"] 

The above ignores my capturing groups. Is the following the only valid solution?

var result = text.match(/test/g); for (var i in result) {     console.log(result[i].match(/t(e)(s)t/)); } /* Result: ["test", "e", "s"] ["test", "e", "s"] ["test", "e", "s"] ["test", "e", "s"] */ 

EDIT:

I am back again to happily tell you that 10 years later you can now do this (.matchAll has been added to the spec)

let result = [...text.matchAll(/t(e)(s)t/g)]; 
like image 225
Chad Scira Avatar asked May 09 '09 20:05

Chad Scira


People also ask

What does regex (? S match?

i) makes the regex case insensitive. (? s) for "single line mode" makes the dot match all characters, including line breaks.

How do you match in regex?

The fundamental building blocks of a regex are patterns that match a single character. Most characters, including all letters ( a-z and A-Z ) and digits ( 0-9 ), match itself. For example, the regex x matches substring "x" ; z matches "z" ; and 9 matches "9" .

How do I match a group in regex?

Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d", "o", and "g".

Which is faster regexp match or regexp test?

The match() method retrieves the matches when matching a string against a regular expression. Use . test if you want a faster boolean check.


2 Answers

Using String's match() function won't return captured groups if the global modifier is set, as you found out.

In this case, you would want to use a RegExp object and call its exec() function. String's match() is almost identical to RegExp's exec() function…except in cases like these. If the global modifier is set, the normal match() function won't return captured groups, while RegExp's exec() function will. (Noted here, among other places.)

Another catch to remember is that exec() doesn't return the matches in one big array—it keeps returning matches until it runs out, in which case it returns null.

So, for example, you could do something like this:

var pattern = /t(e)(s)t/g;  // Alternatively, "new RegExp('t(e)(s)t', 'g');" var match;      while (match = pattern.exec(text)) {     // Do something with the match (["test", "e", "s"]) here... } 

Another thing to note is that RegExp.prototype.exec() and RegExp.prototype.test() execute the regular expression on the provided string and return the first result. Every sequential call will step through the result set updating RegExp.prototype.lastIndex based on the current position in the string.

Here's an example: // remember there are 4 matches in the example and pattern. lastIndex starts at 0

pattern.test(text); // pattern.lastIndex = 4 pattern.test(text); // pattern.lastIndex = 9 pattern.exec(text); // pattern.lastIndex = 14 pattern.exec(text); // pattern.lastIndex = 19  // if we were to call pattern.exec(text) again it would return null and reset the pattern.lastIndex to 0 while (var match = pattern.exec(text)) {     // never gets run because we already traversed the string     console.log(match); }  pattern.test(text); // pattern.lastIndex = 4 pattern.test(text); // pattern.lastIndex = 9  // however we can reset the lastIndex and it will give us the ability to traverse the string from the start again or any specific position in the string pattern.lastIndex = 0;  while (var match = pattern.exec(text)) {     // outputs all matches     console.log(match); } 

You can find information on how to use RegExp objects on the MDN (specifically, here's the documentation for the exec() function).

like image 177
hbw Avatar answered Oct 04 '22 17:10

hbw


I am surprised to see that I am the first person to answer this question with the answer I was looking for 10 years ago (the answer did not exist yet). I also was hoping that the actual spec writers would have answered it before me ;).

.matchAll has already been added to a few browsers.

In modern javascript we can now accomplish this by just doing the following.

let result = [...text.matchAll(/t(e)(s)t/g)]; 

.matchAll spec

.matchAll docs

I now maintain an isomorphic javascript library that helps with a lot of this type of string parsing. You can check it out here: string-saw. It assists in making .matchAll easier to use when using named capture groups.

An example would be

saw(text).matchAll(/t(e)(s)t/g) 

Which outputs a more user-friendly array of matches, and if you want to get fancy you can throw in named capture groups and get an array of objects.

like image 36
Chad Scira Avatar answered Oct 04 '22 15:10

Chad Scira