Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex Group Capture [duplicate]

I have a standard email which I am looking to extract certain details from.

Amongst the email are lines like so:

<strong>Name:</strong> John Smith

So to simulate this I have the following JavaScript:

var str = "<br><strong>Name:</strong> John Smith<br>";
var re = /\<strong>Name\s*:\<\/strong>\s*([^\<]*)/g
match = re.exec(str);
while (match != null) {
    console.log(match[0]);
    match = re.exec(str);
}

This only comes out with one result, which is:

<strong>Name:</strong> John Smith

I was hoping to get the capture group ([^\<]*) which in this example would be John Smith

What am I missing here?

like image 893
Graham Avatar asked Aug 12 '19 12:08

Graham


2 Answers

In regular expressions, the first match is always the entire string that was matched. When using groups, you start matching with group 1 and onwards, so to fix your issue simply replace match[0] with match[1].

That being said, since you are using JavaScript, it would be better to process the DOM itself and extract the text from there, as opposed to processing HTML with regular expressions.

like image 156
npinti Avatar answered Oct 18 '22 21:10

npinti


Capture groups are provided in the match array starting at index 1:

var str = "<br><strong>Name:</strong> John Smith<br>";
var re = /\<strong>Name\s*:\<\/strong>\s*([^\<]*)/g
match = re.exec(str);
while (match != null) {
    console.log(match[1]); // <====
    match = re.exec(str);
}

Index 0 contains the whole match.

On modern JavaScript engines, you could also use named capture groups ((?<theName>...), which you can access via match.groups.theName:

var str = "<br><strong>Name:</strong> John Smith<br>";
var re = /\<strong>Name\s*:\<\/strong>\s*(?<name>[^\<]*)/g
// ---------------------------------------^^^^^^^
match = re.exec(str);
while (match != null) {
    console.log(match.groups.name); // <====
    match = re.exec(str);
}
like image 25
T.J. Crowder Avatar answered Oct 18 '22 23:10

T.J. Crowder