When I write a regular expression like:
var m = /(s+).*?(l)[^l]*?(o+)/.exec("this is hello to you"); console.log(m);
I get a match object containing the following:
{ 0: "s is hello", 1: "s", 2: "l", 3: "o", index: 3, input: "this is hello to you" }
I know the index of the entire match from the index
property, but I also need to know the start and end of the groups matched. Using a simple search won't work. In this example it will find the first 'l' instead of the one found in the group.
Is there any way to get the offset of a matched group?
Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d", "o", and "g".
Groups group multiple patterns as a whole, and capturing groups provide extra submatch information when using a regular expression pattern to match against a string. Backreferences refer to a previously captured group in the same regular expression.
3.6. (? i) makes the regex case insensitive. (? s) for "single line mode" makes the dot match all characters, including line breaks.
Most characters, including all letters ( a-z and A-Z ) and digits ( 0-9 ), match itself. For example, the regex x matches substring "x" ; z matches "z" ; and 9 matches "9" . Non-alphanumeric characters without special meaning in regex also matches itself. For example, = matches "=" ; @ matches "@" .
You can't directly get the index of a match group. What you have to do is first put every character in a match group, even the ones you don't care about:
var m= /(s+)(.*?)(l)([^l]*?)(o+)/.exec('this is hello to you');
Now you've got the whole match in parts:
['s is hello', 's', ' is hel', 'l', '', 'o']
So you can add up the lengths of the strings before your group to get the offset from the match index to the group index:
function indexOfGroup(match, n) { var ix= match.index; for (var i= 1; i<n; i++) ix+= match[i].length; return ix; } console.log(indexOfGroup(m, 3)); // 11
I wrote a simple (well the initialization got a bit bloated) javascript object to solve this problem on a project I've been working on recently. It works the same way as the accepted answer but generates the new regexp and pulls out the data you requested automatically.
var exp = new MultiRegExp(/(firstBit\w+)this text is ignored(optionalBit)?/i); var value = exp.exec("firstbitWithMorethis text is ignored"); value = {0: {index: 0, text: 'firstbitWithMore'}, 1: null};
Git Repo: My MultiRegExp. Hope this helps someone out there.
edit Aug, 2015:
Try me: MultiRegExp Live.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With