I have string with html code.
<h2 class="some-class">
<a href="#link" class="link" id="first-link"
<span class="bold">link</span>
</a>
NEED TO GET THIS
</h2>
I need to get only text content of h2. I create this regular expression:
(?<=>)(.*)(?=<\/h2>)
But it's useful if h2 has no inner tags. Otherwise I get this:
<a href="#link" class="link" id="first-link"
<span class="bold">link</span>
</a>
NEED TO GET THIS
Never use regex to parse HTML, check these famous answers:
Using regular expressions to parse HTML: why not?
RegEx match open tags except XHTML self-contained tags
Instead, generate a temp element with the text as HTML and get content by filtering out text nodes.
var str = `<h2 class="some-class">
<a href="#link" class="link" id="first-link"
<span class="bold">link</span>
</a>
NEED TO GET THIS
</h2>`;
// generate a temporary DOM element
var temp = document.createElement('div');
// set content
temp.innerHTML = str;
// get the h2 element
var h2 = temp.querySelector('h2');
console.log(
// get all child nodes and convert into array
// for older browser use [].slice.call(h2...)
Array.from(h2.childNodes)
// iterate over elements
.map(function(e) {
// if text node then return the content, else return
// empty string
return e.nodeType === 3 ? e.textContent.trim() : '';
})
// join the string array
.join('')
// you can use reduce method instead of map
// .reduce(function(s, e) { return s + (e.nodeType === 3 ? e.textContent.trim() : ''); }, '')
)
Reference :
Fastest way to convert JavaScript NodeList to Array?
Rgex is not good for parsing HTML, but if your html is not valid or any way you like to use regex:
(?!>)([^><]+)(?=<\/h2>)
try Demo
It's getting last texts before closing tag of </h2>
(IF EXISTS)
To avoid null
results changed *
to +
.
This Regex is completely limit and fitting to limited situations as question mentioned.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With