Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get index of each capture in a JavaScript regex

I want to match a regex like /(a).(b)(c.)d/ with "aabccde", and get the following information back:

"a" at index = 0 "b" at index = 2 "cc" at index = 3 

How can I do this? String.match returns list of matches and index of the start of the complete match, not index of every capture.

Edit: A test case which wouldn't work with plain indexOf

regex: /(a).(.)/ string: "aaa" expected result: "a" at 0, "a" at 2 

Note: The question is similar to Javascript Regex: How to find index of each subexpression?, but I cannot modify the regex to make every subexpression a capturing group.

like image 626
user1527166 Avatar asked Apr 10 '13 19:04

user1527166


People also ask

Can you use regex to index?

We can use the JavaScript regex's exec method to find the index of a regex match. For instance, we can write: const match = /bar/. exec("foobar"); console.

Is JavaScript regex faster than IndexOf?

IndexOf is only useful for checking the existence of an exact substring, but Regex is much more powerful and allows you to do so much more.

What is $1 in regex JS?

The $ number language element includes the last substring matched by the number capturing group in the replacement string, where number is the index of the capturing group. For example, the replacement pattern $1 indicates that the matched substring is to be replaced by the first captured group.

How do I find all matches in regex?

To find find all the matches of a regular expression in this string in JavaScript, call match() method on this string, and pass the regular expression as argument. match() method returns an array of strings containing all the matches found for the given regular expression, in this string.


1 Answers

There is currently a proposal (stage 4) to implement this in native Javascript:

RegExp Match Indices for ECMAScript

ECMAScript RegExp Match Indices provide additional information about the start and end indices of captured substrings relative to the start of the input string.

...We propose the adoption of an additional indices property on the array result (the substrings array) of RegExp.prototype.exec(). This property would itself be an indices array containing a pair of start and end indices for each captured substring. Any unmatched capture groups would be undefined, similar to their corresponding element in the substrings array. In addition, the indices array would itself have a groups property containing the start and end indices for each named capture group.

Here's an example of how things would work. The following snippets run without errors in, at least, Chrome:

const re1 = /a+(?<Z>z)?/d;  // indices are relative to start of the input string: const s1 = "xaaaz"; const m1 = re1.exec(s1); console.log(m1.indices[0][0]); // 1 console.log(m1.indices[0][1]); // 5 console.log(s1.slice(...m1.indices[0])); // "aaaz"  console.log(m1.indices[1][0]); // 4 console.log(m1.indices[1][1]); // 5 console.log(s1.slice(...m1.indices[1])); // "z"  console.log(m1.indices.groups["Z"][0]); // 4 console.log(m1.indices.groups["Z"][1]); // 5 console.log(s1.slice(...m1.indices.groups["Z"])); // "z"  // capture groups that are not matched return `undefined`: const m2 = re1.exec("xaaay"); console.log(m2.indices[1]); // undefined console.log(m2.indices.groups.Z); // undefined

So, for the code in the question, we could do:

const re = /(a).(b)(c.)d/d; const str = 'aabccde'; const result = re.exec(str); // indices[0], like result[0], describes the indices of the full match const matchStart = result.indices[0][0]; result.forEach((matchedStr, i) => {   const [startIndex, endIndex] = result.indices[i];   console.log(`${matchedStr} from index ${startIndex} to ${endIndex} in the original string`);   console.log(`From index ${startIndex - matchStart} to ${endIndex - matchStart} relative to the match start\n-----`); });

Output:

aabccd from index 0 to 6 in the original string From index 0 to 6 relative to the match start ----- a from index 0 to 1 in the original string From index 0 to 1 relative to the match start ----- b from index 2 to 3 in the original string From index 2 to 3 relative to the match start ----- cc from index 3 to 5 in the original string From index 3 to 5 relative to the match start 

Keep in mind that the indices array contains the indices of the matched groups relative to the start of the string, not relative to the start of the match.


A polyfill is available here.

like image 97
CertainPerformance Avatar answered Oct 02 '22 08:10

CertainPerformance