<p>I am using php and regex to find unclosed html tags in a string :</p> <p>This is my string :</p> <pre class="prettyprint"><code>$s="<div><h2>Hello world<h2><p>It's 7Am where I live<p><div>"; </code></pre> <p>You can see All tags here are not closed.</p> <p>I want to find all unclosed tags, but the problem is that my regex is matching opening tags also.</p> <p>Here is my regex so far</p> <pre class="prettyprint"><code>/<[^>]+>/i </code></pre> <p>And this is my preg_match_all() function</p> <pre class="prettyprint"><code>preg_match_all("/<[^>]+>/i",$s,$v); print_r($v); </code></pre> <p>What do I need to change in my regex to match only the unclosed tags?</p> <pre class="prettyprint"><code> <h2> <p> <div> </code></pre>

<p>You might be unaware of this, but <code>DOMDocument</code> can help you fix the HTML.</p> <pre class="prettyprint"><code>$html = "<div><h2>Hello world<h2><p>It's 7Am where I live<p><div>"; libxml_use_internal_errors(true); $dom = new DOMDocument(); $dom->loadHTML('<root>' . $html . '</root>', LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD); $xpath = new DOMXPath($dom); foreach( $xpath->query('//*[not(node())]') as $node ) { $node->parentNode->removeChild($node); } echo substr($dom->saveHTML(), 6, -8); </code></pre> <p>See IDEONE demo</p> <p>Result: <code><div><h2>Hello world</h2><p>It's 7Am where I live</p></div></code></p> <p>Note that the XPath-based empty node cleanup is necessary as the DOM contains empty <code><h2></h2></code>, <code><p></p></code> and <code><div></div></code> tags after loading HTML into DOM.</p> <p>The <code><root></code> element is added in the beginning to make sure we get the root element alright. Later, we can post-process it with <code>substr</code>.</p> <p>The <code>LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD</code> flags are necessary so that no DTD and other rubbish were not added to the DOM.</p>

<p>Finding unmatched tags seems fundamentally too hard to do with a regex. You basically need to put each opening tag to you see onto a queue and then pop it off of the queue when you see the closing tag.</p> <p>Recommend you use a library that does HTML validation. See these questions:</p> <p>Remove unmatched HTML tags in a string</p> <p>How to find the unclosed div tag</p> <p>PHP get all unclosed HTML tags in string</p>

Match unclosed html tags using regex and php

I am using php and regex to find unclosed html tags in a string :

This is my string :

$s="<div><h2>Hello world<h2><p>It's 7Am where I live<p><div>";

You can see All tags here are not closed.

I want to find all unclosed tags, but the problem is that my regex is matching opening tags also.

Here is my regex so far

/<[^>]+>/i

And this is my preg_match_all() function

preg_match_all("/<[^>]+>/i",$s,$v);

print_r($v);

What do I need to change in my regex to match only the unclosed tags?

 <h2>
 <p>
 <div>

How to strip HTML tags from string in PHP?

The strip_tags() function strips a string from HTML, XML, and PHP tags. Note: HTML comments are always stripped. This cannot be changed with the allow parameter. Note: This function is binary-safe.

What is the use of strip_ tags?

The strip_tags() function is an inbuilt function in PHP which is used to strips a string from HTML, and PHP tags. This function returns a string with all NULL bytes, HTML, and PHP tags stripped from a given $str.

How to remove HTML tags in PHP from mysql?

To remove HTML tags in PHP, we can either use the strip_tags() or htmlentities() function: The strip_tags() function will remove all HTML tags. For example, $clean = strip_tags("<p>Foo</p> Bar"); will result in Foo Bar . The htmlentities() function will not remove but convert all symbols into HTML entities.

Is it possible to remove HTML tags from data?

Strip_tags() is a function that allows you to strip out all HTML and PHP tags from a given string (parameter one), however you can also use parameter two to specify a list of HTML tags you want.

You might be unaware of this, but DOMDocument can help you fix the HTML.

$html = "<div><h2>Hello world<h2><p>It's 7Am where I live<p><div>";
libxml_use_internal_errors(true);

$dom = new DOMDocument();
$dom->loadHTML('<root>' . $html . '</root>', LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);

foreach( $xpath->query('//*[not(node())]') as $node ) {
    $node->parentNode->removeChild($node);
}
echo substr($dom->saveHTML(), 6, -8);

See IDEONE demo

Result: <div><h2>Hello world</h2><p>It's 7Am where I live</p></div>

Note that the XPath-based empty node cleanup is necessary as the DOM contains empty <h2></h2>, <p></p> and <div></div> tags after loading HTML into DOM.

The <root> element is added in the beginning to make sure we get the root element alright. Later, we can post-process it with substr.

The LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD flags are necessary so that no DTD and other rubbish were not added to the DOM.

Finding unmatched tags seems fundamentally too hard to do with a regex. You basically need to put each opening tag to you see onto a queue and then pop it off of the queue when you see the closing tag.

Recommend you use a library that does HTML validation. See these questions:

Remove unmatched HTML tags in a string

How to find the unclosed div tag

PHP get all unclosed HTML tags in string

Match unclosed html tags using regex and php

Tags:

regex

php

Amit Verma

People also ask

2 Answers

Wiktor Stribiżew

Cargo23

Recent Activity

Donate For Us

Match unclosed html tags using regex and php

Tags:

regex

php

Amit Verma

People also ask

2 Answers

Wiktor Stribiżew

Cargo23

Related questions

Recent Activity

Donate For Us