Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check if a string is html or not

A better regex to use to check if a string is HTML is:

/^/

For example:

/^/.test('') // true
/^/.test('foo bar baz') //true
/^/.test('<p>fizz buzz</p>') //true

In fact, it's so good, that it'll return true for every string passed to it, which is because every string is HTML. Seriously, even if it's poorly formatted or invalid, it's still HTML.

If what you're looking for is the presence of HTML elements, rather than simply any text content, you could use something along the lines of:

/<\/?[a-z][\s\S]*>/i.test()

It won't help you parse the HTML in any way, but it will certainly flag the string as containing HTML elements.


Method #1. Here is the simple function to test if the string contains HTML data:

function isHTML(str) {
  var a = document.createElement('div');
  a.innerHTML = str;

  for (var c = a.childNodes, i = c.length; i--; ) {
    if (c[i].nodeType == 1) return true; 
  }

  return false;
}

The idea is to allow browser DOM parser to decide if provided string looks like an HTML or not. As you can see it simply checks for ELEMENT_NODE (nodeType of 1).

I made a couple of tests and looks like it works:

isHTML('<a>this is a string</a>') // true
isHTML('this is a string')        // false
isHTML('this is a <b>string</b>') // true

This solution will properly detect HTML string, however it has side effect that img/vide/etc. tags will start downloading resource once parsed in innerHTML.

Method #2. Another method uses DOMParser and doesn't have loading resources side effects:

function isHTML(str) {
  var doc = new DOMParser().parseFromString(str, "text/html");
  return Array.from(doc.body.childNodes).some(node => node.nodeType === 1);
}

Notes:
1. Array.from is ES2015 method, can be replaced with [].slice.call(doc.body.childNodes).
2. Arrow function in some call can be replaced with usual anonymous function.


A little bit of validation with:

/<(?=.*? .*?\/ ?>|br|hr|input|!--|wbr)[a-z]+.*?>|<([a-z]+).*?<\/\1>/i.test(htmlStringHere) 

This searches for empty tags (some predefined) and / terminated XHTML empty tags and validates as HTML because of the empty tag OR will capture the tag name and attempt to find it's closing tag somewhere in the string to validate as HTML.

Explained demo: http://regex101.com/r/cX0eP2

Update:

Complete validation with:

/<(br|basefont|hr|input|source|frame|param|area|meta|!--|col|link|option|base|img|wbr|!DOCTYPE).*?>|<(a|abbr|acronym|address|applet|article|aside|audio|b|bdi|bdo|big|blockquote|body|button|canvas|caption|center|cite|code|colgroup|command|datalist|dd|del|details|dfn|dialog|dir|div|dl|dt|em|embed|fieldset|figcaption|figure|font|footer|form|frameset|head|header|hgroup|h1|h2|h3|h4|h5|h6|html|i|iframe|ins|kbd|keygen|label|legend|li|map|mark|menu|meter|nav|noframes|noscript|object|ol|optgroup|output|p|pre|progress|q|rp|rt|ruby|s|samp|script|section|select|small|span|strike|strong|style|sub|summary|sup|table|tbody|td|textarea|tfoot|th|thead|time|title|tr|track|tt|u|ul|var|video).*?<\/\2>/i.test(htmlStringHere) 

This does proper validation as it contains ALL HTML tags, empty ones first followed by the rest which need a closing tag.

Explained demo here: http://regex101.com/r/pE1mT5


zzzzBov's answer above is good, but it does not account for stray closing tags, like for example:

/<[a-z][\s\S]*>/i.test('foo </b> bar'); // false

A version that also catches closing tags could be this:

/<[a-z/][\s\S]*>/i.test('foo </b> bar'); // true