This is my PHP functions to remove all empty HTML tags from string input:
/**
* Remove the nested HTML empty tags from the string.
*
* @param $string String to remove tags
* @param null $replaceTo Replace empty string with
* @return mixed Cleaned string
*/
function crl_remove_empty_tags($string, $replaceTo = null)
{
// Return if string not given or empty
if (!is_string($string) || trim($string) == '') return $string;
// Recursive empty HTML tags
return preg_replace(
'/<(\w+)\b(?:\s+[\w\-.:]+(?:\s*=\s*(?:"[^"]*"|"[^"]*"|[\w\-.:]+))?)*\s*/?>\s*</\1\s*>/gixsm',
!is_string($replaceTo) ? '' : $replaceTo,
$string
);
}
My regex: /<(\w+)\b(?:\s+[\w\-.:]+(?:\s*=\s*(?:"[^"]*"|"[^"]*"|[\w\-.:]+))?)*\s*/?>\s*</\1\s*>/gixsm
I tested it with http://gskinner.com/RegExr/ and http://regexpal.com/, it worked well. But when I tried to run it. Server always returned the error:
Warning: preg_replace(): Unknown modifier '\'
I have no idea what excactly '\' goes wrong. Someone please help me out!
In php regular expressions you need to escape your delimiters if they occur literally within your expression.
In your case, you have two unescaped /
; simply replace them with \/
. You also don't need the array of modifiers -- php is global by default, and you have no literal word characters defined.
Before:
/<(\w+)\b(?:\s+[\w\-.:]+(?:\s*=\s*(?:"[^"]*"|"[^"]*"|[\w\-.:]+))?)*\s*/?>\s*</\1\s*>/gixsm
After:
/<(\w+)\b(?:\s+[\w\-.:]+(?:\s*=\s*(?:"[^"]*"|"[^"]*"|[\w\-.:]+))?)*\s*\/?>\s*<\/\1\s*>/
// ^ ^
This pattern is able to remove "empty tags" (i.e. non self-closing tags where that contain nothing, white-spaces, html comments or other "empty tags"), even if these tags are nested like <span><span></span></span>
. Tags inside html comments are not taken in account:
$pattern = <<<'EOD'
~
<
(?:
!--[^-]*(?:-(?!->)[^-]*)*-->[^<]*(*SKIP)(*F) # skip comments
|
( # group 1
(\w++) # tag name in group 2
[^"'>]* #'"# all that is not a quote or a closing angle bracket
(?: # quoted attributes
"[^\\"]*(?:\\.[^\\"]*)*+" [^"'>]* #'"# double quote
|
'[^\\']*(?:\\.[^\\']*)*+' [^"'>]* #'"# single quote
)*+
>
\s*
(?:
<!--[^-]*(?:-(?!->)[^-]*)*+--> \s* # html comments
|
<(?1) \s* # recursion with the group 1
)*+
</\2> # closing tag
) # end of the group 1
)
~sxi
EOD;
$html = preg_replace($pattern, '', $html);
Limitations:
<script src="myscript.js"></script>
var myvar="<span></span>";
var myvar1="<span><!--";
function doSomething() { alert("!!!"); }
var myvar2="--></span>";
These limitations are due to the fact that a basic text approach is not able to make the difference between html and javascript code. However, it is possible to solve this problem if you add "script" tags in the pattern skip list (in the same way than html comments), but in this case you need to basically describe the Javascript content (strings, comments, literal patterns, all that is not the previous three) that isn't a trivial task but possible.
Remove empty elements... and the next empty elements.
P.e.
<p>Hello!
<div class="foo"><p id="nobody">
</p>
</div>
</p>
Results:
<p>Hello!</p>
Php code:
/* $html store the html content */
do {
$tmp = $html;
$html = preg_replace( '#<([^ >]+)[^>]*>([[:space:]]| )*</\1>#', '', $html );
} while ( $html !== $tmp );
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With