Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Javascript Regex - Find all possible matches, even in already captured matches

I'm trying to obtain all possible matches from a string using regex with javascript. It appears that my method of doing this is not matching parts of the string that have already been matched.

Variables:

var string = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y';

var reg = /A[0-9]+B[0-9]+Y:A[0-9]+B[0-9]+Y/g;

Code:

var match = string.match(reg);

All matched results I get:

A1B1Y:A1B2Y
A1B5Y:A1B6Y
A1B9Y:A1B10Y

Matched results I want:

A1B1Y:A1B2Y
A1B2Y:A1B3Y
A1B5Y:A1B6Y
A1B6Y:A1B7Y
A1B9Y:A1B10Y
A1B10Y:A1B11Y

In my head, I want A1B1Y:A1B2Y to be a match along with A1B2Y:A1B3Y, even though A1B2Y in the string will need to be part of two matches.

like image 701
Vinnie Cent Avatar asked Feb 13 '13 21:02

Vinnie Cent


2 Answers

Without modifying your regex, you can set it to start matching at the beginning of the second half of the match after each match using .exec and manipulating the regex object's lastIndex property.

var string = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y';
var reg = /A[0-9]+B[0-9]+Y:A[0-9]+B[0-9]+Y/g;
var matches = [], found;
while (found = reg.exec(string)) {
    matches.push(found[0]);
    reg.lastIndex -= found[0].split(':')[1].length;
}

console.log(matches);
//["A1B1Y:A1B2Y", "A1B2Y:A1B3Y", "A1B5Y:A1B6Y", "A1B6Y:A1B7Y", "A1B9Y:A1B10Y", "A1B10Y:A1B11Y"]

Demo


As per Bergi's comment, you can also get the index of the last match and increment it by 1 so it instead of starting to match from the second half of the match onwards, it will start attempting to match from the second character of each match onwards:

reg.lastIndex = found.index+1;

Demo

The final outcome is the same. Though, Bergi's update has a little less code and performs slightly faster. =]

like image 142
Fabrício Matté Avatar answered Sep 20 '22 15:09

Fabrício Matté


You cannot get the direct result from match, but it is possible to produce the result via RegExp.exec and with some modification to the regex:

var regex = /A[0-9]+B[0-9]+Y(?=(:A[0-9]+B[0-9]+Y))/g;
var input = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y'
var arr;
var results = [];

while ((arr = regex.exec(input)) !== null) {
    results.push(arr[0] + arr[1]);
}

I used zero-width positive look-ahead (?=pattern) in order not to consume the text, so that the overlapping portion can be rematched.

Actually, it is possible to abuse replace method to do achieve the same result:

var input = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y'
var results = [];

input.replace(/A[0-9]+B[0-9]+Y(?=(:A[0-9]+B[0-9]+Y))/g, function ($0, $1) {
    results.push($0 + $1);
    return '';
});

However, since it is replace, it does extra useless replacement work.

like image 23
nhahtdh Avatar answered Sep 16 '22 15:09

nhahtdh