Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find indices of groups in JavaScript regular expressions match? [duplicate]

When I write a regular expression like:

var m = /(s+).*?(l)[^l]*?(o+)/.exec("this is hello to you"); console.log(m); 

I get a match object containing the following:

{   0: "s is hello",   1: "s",   2: "l",   3: "o",   index: 3,   input: "this is hello to you" } 

I know the index of the entire match from the index property, but I also need to know the start and end of the groups matched. Using a simple search won't work. In this example it will find the first 'l' instead of the one found in the group.

Is there any way to get the offset of a matched group?

like image 894
Michael Andersen Avatar asked Dec 31 '09 14:12

Michael Andersen


People also ask

How do I match a group in regex?

Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d", "o", and "g".

What is capturing group in regex JavaScript?

Groups group multiple patterns as a whole, and capturing groups provide extra submatch information when using a regular expression pattern to match against a string. Backreferences refer to a previously captured group in the same regular expression.

What does regex (? S match?

3.6. (? i) makes the regex case insensitive. (? s) for "single line mode" makes the dot match all characters, including line breaks.

How do I match a pattern in regex?

Most characters, including all letters ( a-z and A-Z ) and digits ( 0-9 ), match itself. For example, the regex x matches substring "x" ; z matches "z" ; and 9 matches "9" . Non-alphanumeric characters without special meaning in regex also matches itself. For example, = matches "=" ; @ matches "@" .


2 Answers

You can't directly get the index of a match group. What you have to do is first put every character in a match group, even the ones you don't care about:

var m= /(s+)(.*?)(l)([^l]*?)(o+)/.exec('this is hello to you'); 

Now you've got the whole match in parts:

['s is hello', 's', ' is hel', 'l', '', 'o'] 

So you can add up the lengths of the strings before your group to get the offset from the match index to the group index:

function indexOfGroup(match, n) {     var ix= match.index;     for (var i= 1; i<n; i++)         ix+= match[i].length;     return ix; }  console.log(indexOfGroup(m, 3)); // 11 
like image 126
bobince Avatar answered Sep 24 '22 01:09

bobince


I wrote a simple (well the initialization got a bit bloated) javascript object to solve this problem on a project I've been working on recently. It works the same way as the accepted answer but generates the new regexp and pulls out the data you requested automatically.

var exp = new MultiRegExp(/(firstBit\w+)this text is ignored(optionalBit)?/i); var value = exp.exec("firstbitWithMorethis text is ignored");  value = {0: {index: 0, text: 'firstbitWithMore'},          1: null}; 

Git Repo: My MultiRegExp. Hope this helps someone out there.

edit Aug, 2015:

Try me: MultiRegExp Live.

like image 26
Delus Avatar answered Sep 23 '22 01:09

Delus