Preface: I'm aware about general consensus standing against using regex to parse HTML. Asking you in advance, please avoid any recommendations in this regard.
I have the following regex
/<div class="panel-body">([^]*?)(<\/div>|$)/gi
It matches all content, including self, inside of the the div
with class .panel-body
Full match:
<div class="panel-body">
<a href="#">Link</a>
Line 1
Line 2
Line 3
</div>
.. it also matches content with no closing div
tag.
Full match:
<div class="panel-body">
<a href="#">Link</a>
Line 1
Line 2
Line 3
Don't match after closing `div`...but match this and below in case closing `div` is removed.
Line below 1
Line below 2
Line below 3
How could I improve my regex to do the following:
Not include in the full match <div class="panel-body">
and closing </div>
(when there is closing div
tag)
To do this straight (if possible) into the full match without using groups
regex101.com example
The string doesn't start with <div class="panel-body">
, it starts with
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Webmin 1.851 on centos.centos (CentOS Linux 7.3.1611)</title>
</head>
<body>
<div>
<div>
<div class="panel-body">
* Note: It's never closed until the full load as it's progressive output.
After posted answers, I made speed comparison tests. It's up to you, whose solution would serve best for you.
Speed-test ResultsYou can use a DOM
parser, that should with incomplete tags as well:
function divContent(str) {
// create a new dov container
var div = document.createElement('div');
// assign your HTML to div's innerHTML
div.innerHTML = '<html>' + str + '</html>';
// find an element by given className
var el = div.getElementsByClassName("panel-body");
// return found element's first innerHTML
return (el.length > 0 ? el[el.length-1].innerHTML : "");
}
// extract text from a complete tag:
var html = `<div class="panel-body">
<a href="#">Link</a>
Line 1
Line 2
Line 3
</div>`;
console.log(divContent(html));
// extract text from an incomplete tag:
html = `<div class="panel-body">
<a href="#">Link</a>
Line 1
Line 2
Line 3
Don't match after closing 'div'...but match this and below
in case closing 'div' is removed.
Line below 1
Line below 2
Line below 3`;
console.log(divContent(html));
// OP'e edited HTML text
html = `<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Webmin 1.851 on centos.centos (CentOS Linux 7.3.1611)</title>
</head>
<body>
<div>
<div>
<div class="panel-body">`;
console.log(divContent(html));
JS Fiddle
I can't comment yet so I will try an answer. How about non-capturing groups, You still have it in the full match, but your only entry in matches would be the content. so index 0.
(?:<div class="panel-body">)([^]*?)(?:<\/div>|$)
https://regex101.com/r/OJf1Rt/3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With