RegExp. Get only text content of tag (without inner tags)

Question

I have string with html code.

<h2 class="some-class"> 
   <a href="#link" class="link" id="first-link"
      <span class="bold">link</span>
   </a>
   NEED TO GET THIS
</h2>

I need to get only text content of h2. I create this regular expression:

(?<=>)(.*)(?=<\/h2>)

But it's useful if h2 has no inner tags. Otherwise I get this:

   <a href="#link" class="link" id="first-link"
      <span class="bold">link</span>
   </a>
   NEED TO GET THIS

Pranav C Balan · Accepted Answer

Never use regex to parse HTML, check these famous answers:

Using regular expressions to parse HTML: why not?

RegEx match open tags except XHTML self-contained tags

Instead, generate a temp element with the text as HTML and get content by filtering out text nodes.

var str = `<h2 class="some-class"> 
   <a href="#link" class="link" id="first-link"
      <span class="bold">link</span>
   </a>
   NEED TO GET THIS
</h2>`;

// generate a temporary DOM element
var temp = document.createElement('div');
// set content
temp.innerHTML = str;
// get the h2 element
var h2 = temp.querySelector('h2');

console.log(
  // get all child nodes and convert into array
  // for older browser use [].slice.call(h2...)
  Array.from(h2.childNodes)
  // iterate over elements
  .map(function(e) {
    // if text node then return the content, else return 
    // empty string
    return e.nodeType === 3 ? e.textContent.trim() : '';
  })
  // join the string array
  .join('')
  // you can use reduce method instead of map
  // .reduce(function(s, e) { return s + (e.nodeType === 3 ? e.textContent.trim() : ''); }, '') 
)

Reference :

Fastest way to convert JavaScript NodeList to Array?

MohaMad · Answer

Rgex is not good for parsing HTML, but if your html is not valid or any way you like to use regex:

(?!>)([^><]+)(?=<\/h2>)

try Demo

It's getting last texts before closing tag of </h2> (IF EXISTS)
To avoid null results changed * to +.
This Regex is completely limit and fitting to limited situations as question mentioned.

RegExp. Get only text content of tag (without inner tags)

Tags:

javascript

html

regex

andreyb1990

2 Answers

Pranav C Balan

MohaMad

Recent Activity

Donate For Us

RegExp. Get only text content of tag (without inner tags)

Tags:

javascript

html

regex

andreyb1990

2 Answers

Pranav C Balan

MohaMad

Related questions

Recent Activity

Donate For Us