I have a standard email which I am looking to extract certain details from.
Amongst the email are lines like so:
<strong>Name:</strong> John Smith
So to simulate this I have the following JavaScript:
var str = "<br><strong>Name:</strong> John Smith<br>";
var re = /\<strong>Name\s*:\<\/strong>\s*([^\<]*)/g
match = re.exec(str);
while (match != null) {
console.log(match[0]);
match = re.exec(str);
}
This only comes out with one result, which is:
<strong>Name:</strong> John Smith
I was hoping to get the capture group ([^\<]*)
which in this example would be John Smith
What am I missing here?
In regular expressions, the first match is always the entire string that was matched. When using groups, you start matching with group 1 and onwards, so to fix your issue simply replace match[0]
with match[1]
.
That being said, since you are using JavaScript, it would be better to process the DOM itself and extract the text from there, as opposed to processing HTML with regular expressions.
Capture groups are provided in the match array starting at index 1:
var str = "<br><strong>Name:</strong> John Smith<br>";
var re = /\<strong>Name\s*:\<\/strong>\s*([^\<]*)/g
match = re.exec(str);
while (match != null) {
console.log(match[1]); // <====
match = re.exec(str);
}
Index 0 contains the whole match.
On modern JavaScript engines, you could also use named capture groups ((?<theName>...)
, which you can access via match.groups.theName
:
var str = "<br><strong>Name:</strong> John Smith<br>";
var re = /\<strong>Name\s*:\<\/strong>\s*(?<name>[^\<]*)/g
// ---------------------------------------^^^^^^^
match = re.exec(str);
while (match != null) {
console.log(match.groups.name); // <====
match = re.exec(str);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With