Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I match overlapping strings with regex?

Let's say I have the string

"12345" 

If I .match(/\d{3}/g), I only get one match, "123". Why don't I get [ "123", "234", "345" ]?

like image 982
user3025492 Avatar asked Dec 30 '13 04:12

user3025492


People also ask

Can regex matches overlap?

You can use the new Python regex module, which supports overlapping matches.

How do you check if a string matches a regex?

Use the test() method to check if a regular expression matches an entire string, e.g. /^hello$/. test(str) . The caret ^ and dollar sign $ match the beginning and end of the string. The test method returns true if the regex matches the entire string, and false otherwise.

What does regex (? S match?

i) makes the regex case insensitive. (? s) for "single line mode" makes the dot match all characters, including line breaks.

How do you match a string to a pattern?

To match a character in the string expression against a range of characters. Put brackets ( [ ] ) in the pattern string, and inside the brackets put the lowest and highest characters in the range, separated by a hyphen ( – ). Any single character within the range makes a successful match.


2 Answers

The string#match with a global flag regex returns an array of matched substrings. The /\d{3}/g regex matches and consumes (=reads into the buffer and advances its index to the position right after the currently matched character) 3 digit sequence. Thus, after "eating up" 123, the index is located after 3, and the only substring left for parsing is 45 - no match here.

I think the technique used at regex101.com is also worth considering here: use a zero-width assertion (a positive lookahead with a capturing group) to test all positions inside the input string. After each test, the RegExp.lastIndex (it's a read/write integer property of regular expressions that specifies the index at which to start the next match) is advanced "manually" to avoid infinite loop.

Note it is a technique implemented in .NET (Regex.Matches), Python (re.findall), PHP (preg_match_all), Ruby (String#scan) and can be used in Java, too. Here is a demo using matchAll:

var re = /(?=(\d{3}))/g; console.log( Array.from('12345'.matchAll(re), x => x[1]) );

Here is an ES5 compliant demo:

var re = /(?=(\d{3}))/g; var str = '12345'; var m, res = [];   while (m = re.exec(str)) {     if (m.index === re.lastIndex) {         re.lastIndex++;     }     res.push(m[1]); }  console.log(res);

Here is a regex101.com demo

Note that the same can be written with a "regular" consuming \d{3} pattern and manually set re.lastIndex to m.index+1 value after each successful match:

var re = /\d{3}/g; var str = '12345'; var m, res = [];  while (m = re.exec(str)) {     res.push(m[0]);     re.lastIndex = m.index + 1; // <- Important } console.log(res);
like image 126
Wiktor Stribiżew Avatar answered Sep 17 '22 06:09

Wiktor Stribiżew


You can't do this with a regex alone, but you can get pretty close:

var pat = /(?=(\d{3}))\d/g;  var results = [];  var match;    while ( (match = pat.exec( '1234567' ) ) != null ) {     results.push( match[1] );  }    console.log(results);

In other words, you capture all three digits inside the lookahead, then go back and match one character in the normal way just to advance the match position. It doesn't matter how you consume that character; . works just as well \d. And if you're really feeling adventurous, you can use just the lookahead and let JavaScript handle the bump-along.

This code is adapted from this answer. I would have flagged this question as a duplicate of that one, but the OP accepted another, lesser answer.

like image 38
Alan Moore Avatar answered Sep 21 '22 06:09

Alan Moore