I want to match a portion of a string using a regular expression and then access that parenthesized substring:
var myString = "something format_abc"; // I want "abc" var arr = /(?:^|\s)format_(.*?)(?:\s|$)/.exec(myString); console.log(arr); // Prints: [" format_abc", "abc"] .. so far so good. console.log(arr[1]); // Prints: undefined (???) console.log(arr[0]); // Prints: format_undefined (!!!)
What am I doing wrong?
I've discovered that there was nothing wrong with the regular expression code above: the actual string which I was testing against was this:
"date format_%A"
Reporting that "%A" is undefined seems a very strange behaviour, but it is not directly related to this question, so I've opened a new one, Why is a matched substring returning "undefined" in JavaScript?.
The issue was that console.log
takes its parameters like a printf
statement, and since the string I was logging ("%A"
) had a special value, it was trying to find the value of the next parameter.
To find out how many groups are present in the expression, call the groupCount method on a matcher object. The groupCount method returns an int showing the number of capturing groups present in the matcher's pattern. There is also a special group, group 0, which always represents the entire expression.
The method str. match(regexp) finds matches for regexp in the string str . If the regexp has flag g , then it returns an array of all matches as strings, without capturing groups and other details. If there are no matches, no matter if there's flag g or not, null is returned.
A part of a pattern can be enclosed in parentheses (...) . This is called a “capturing group”. That has two effects: It allows to get a part of the match as a separate item in the result array.
The REGEXMATCH function belongs to Google Sheets' suite of REGEX functions along with functions like REGEXEXTRACT and REGEXREPLACE. Its main task is to find if a string of text matches a regular expression. The function returns a TRUE if the text matches the regular expression's pattern and a FALSE if it doesn't.
You can access capturing groups like this:
var myString = "something format_abc"; var myRegexp = /(?:^|\s)format_(.*?)(?:\s|$)/g; var myRegexp = new RegExp("(?:^|\s)format_(.*?)(?:\s|$)", "g"); var match = myRegexp.exec(myString); console.log(match[1]); // abc
And if there are multiple matches you can iterate over them:
var myString = "something format_abc"; var myRegexp = new RegExp("(?:^|\s)format_(.*?)(?:\s|$)", "g"); match = myRegexp.exec(myString); while (match != null) { // matched text: match[0] // match start: match.index // capturing group n: match[n] console.log(match[0]) match = myRegexp.exec(myString); }
As you can see the way to iterate over multiple matches was not very intuitive. This lead to the proposal of the String.prototype.matchAll
method. This new method is expected to ship in the ECMAScript 2020 specification. It gives us a clean API and solves multiple problems. It has been started to land on major browsers and JS engines as Chrome 73+ / Node 12+ and Firefox 67+.
The method returns an iterator and is used as follows:
const string = "something format_abc"; const regexp = /(?:^|\s)format_(.*?)(?:\s|$)/g; const matches = string.matchAll(regexp); for (const match of matches) { console.log(match); console.log(match.index) }
As it returns an iterator, we can say it's lazy, this is useful when handling particularly large numbers of capturing groups, or very large strings. But if you need, the result can be easily transformed into an Array by using the spread syntax or the Array.from
method:
function getFirstGroup(regexp, str) { const array = [...str.matchAll(regexp)]; return array.map(m => m[1]); } // or: function getFirstGroup(regexp, str) { return Array.from(str.matchAll(regexp), m => m[1]); }
In the meantime, while this proposal gets more wide support, you can use the official shim package.
Also, the internal workings of the method are simple. An equivalent implementation using a generator function would be as follows:
function* matchAll(str, regexp) { const flags = regexp.global ? regexp.flags : regexp.flags + "g"; const re = new RegExp(regexp, flags); let match; while (match = re.exec(str)) { yield match; } }
A copy of the original regexp is created; this is to avoid side-effects due to the mutation of the lastIndex
property when going through the multple matches.
Also, we need to ensure the regexp has the global flag to avoid an infinite loop.
I'm also happy to see that even this StackOverflow question was referenced in the discussions of the proposal.
Here’s a method you can use to get the nth capturing group for each match:
function getMatches(string, regex, index) { index || (index = 1); // default to the first capturing group var matches = []; var match; while (match = regex.exec(string)) { matches.push(match[index]); } return matches; } // Example : var myString = 'something format_abc something format_def something format_ghi'; var myRegEx = /(?:^|\s)format_(.*?)(?:\s|$)/g; // Get an array containing the first capturing group for every match var matches = getMatches(myString, myRegEx, 1); // Log results document.write(matches.length + ' matches found: ' + JSON.stringify(matches)) console.log(matches);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With