Need a regex for preg_replace.
This question wasn't answered in "another question" because not all tags I want to remove aren't empty.
I have not only to remove empty tags from an HTML structure, but also tags containing line breaks as well as white spaces and/or their html code.
Possible Codes are:
<br />            
BEFORE removing matching tags:
<div>
<h1>This is a html structure.</h1>
<p>This is not empty.</p>
<p></p>
<p><br /></p>
<p> <br /> &;thinsp;</p>
<p> </p>
<p> </p>
</div>
AFTER removing matching tags:
<div>
<h1>This is a html structure.</h1>
<p>This is not empty.</p>
</div>
You can use the following:
<([^>\s]+)[^>]*>(?:\s*(?:<br \/>| | | | | | | )\s*)*<\/\1>
And replace with ''
(empty string)
See DEMO
Note: This will also work for empty html tags with attributes.
Use tidy It uses the following function:
function cleaning($string, $tidyConfig = null) {
$out = array ();
$config = array (
'indent' => true,
'show-body-only' => false,
'clean' => true,
'output-xhtml' => true,
'preserve-entities' => true
);
if ($tidyConfig == null) {
$tidyConfig = &$config;
}
$tidy = new tidy ();
$out ['full'] = $tidy->repairString ( $string, $tidyConfig, 'UTF8' );
unset ( $tidy );
unset ( $tidyConfig );
$out ['body'] = preg_replace ( "/.*<body[^>]*>|<\/body>.*/si", "", $out ['full'] );
$out ['style'] = '<style type="text/css">' . preg_replace ( "/.*<style[^>]*>|<\/style>.*/si", "", $out ['full'] ) . '</style>';
return ($out);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With