Regex to remove all empty HTML tags

Question

This is my PHP functions to remove all empty HTML tags from string input:

/**
 * Remove the nested HTML empty tags from the string.
 *
 * @param $string String to remove tags
 * @param null $replaceTo Replace empty string with
 * @return mixed Cleaned string
 */
function crl_remove_empty_tags($string, $replaceTo = null)
{
    // Return if string not given or empty
    if (!is_string($string) || trim($string) == '') return $string;

    // Recursive empty HTML tags
    return preg_replace(
        '/<(\w+)\b(?:\s+[\w\-.:]+(?:\s*=\s*(?:"[^"]*"|"[^"]*"|[\w\-.:]+))?)*\s*/?>\s*</\1\s*>/gixsm',
        !is_string($replaceTo) ? '' : $replaceTo,
        $string
    );
}

My regex: /<(\w+)\b(?:\s+[\w\-.:]+(?:\s*=\s*(?:"[^"]*"|"[^"]*"|[\w\-.:]+))?)*\s*/?>\s*</\1\s*>/gixsm

I tested it with http://gskinner.com/RegExr/ and http://regexpal.com/, it worked well. But when I tried to run it. Server always returned the error:

Warning: preg_replace(): Unknown modifier '\'

I have no idea what excactly '\' goes wrong. Someone please help me out!

brandonscript · Accepted Answer

In php regular expressions you need to escape your delimiters if they occur literally within your expression.

In your case, you have two unescaped /; simply replace them with \/. You also don't need the array of modifiers -- php is global by default, and you have no literal word characters defined.

Before:

/<(\w+)\b(?:\s+[\w\-.:]+(?:\s*=\s*(?:"[^"]*"|"[^"]*"|[\w\-.:]+))?)*\s*/?>\s*</\1\s*>/gixsm

After:

/<(\w+)\b(?:\s+[\w\-.:]+(?:\s*=\s*(?:"[^"]*"|"[^"]*"|[\w\-.:]+))?)*\s*\/?>\s*<\/\1\s*>/
//                                                                    ^       ^

Casimir et Hippolyte · Answer

This pattern is able to remove "empty tags" (i.e. non self-closing tags where that contain nothing, white-spaces, html comments or other "empty tags"), even if these tags are nested like <span><span></span></span>. Tags inside html comments are not taken in account:

$pattern = <<<'EOD'
~
<
(?:
    !--[^-]*(?:-(?!->)[^-]*)*-->[^<]*(*SKIP)(*F) # skip comments
  |
    ( # group 1
        (\w++)     # tag name in group 2
        [^"'>]* #'"# all that is not a quote or a closing angle bracket
        (?: # quoted attributes
            "[^\"]*(?:\.[^\"]*)*+" [^"'>]* #'"# double quote
          |
            '[^\']*(?:\.[^\']*)*+' [^"'>]* #'"# single quote
        )*+
        >
        \s*
        (?:
            <!--[^-]*(?:-(?!->)[^-]*)*+--> \s* # html comments
          |
            <(?1) \s*                          # recursion with the group 1
        )*+
        </\2> # closing tag
    ) # end of the group 1
)
~sxi
EOD;

$html = preg_replace($pattern, '', $html);

Limitations:

This approach will remove links to external Javascript files:
<script src="myscript.js"></script>
The pattern may remove part of embedded Javascript code if something like:
var myvar="<span></span>";
or like:
var myvar1="<span></span>";
is found.

These limitations are due to the fact that a basic text approach is not able to make the difference between html and javascript code. However, it is possible to solve this problem if you add "script" tags in the pattern skip list (in the same way than html comments), but in this case you need to basically describe the Javascript content (strings, comments, literal patterns, all that is not the previous three) that isn't a trivial task but possible.

Alejandro Salamanca Mazuelo · Answer

Remove empty elements... and the next empty elements.

P.e.

<p>Hello!
   <div class="foo"><p id="nobody">
   </p>
      </div>
 </p>

Results:

<p>Hello!</p>

Php code:

/* $html store the html content */
do {
    $tmp = $html;
    $html = preg_replace( '#<([^ >]+)[^>]*>([[:space:]]|&nbsp;)*</\1>#', '', $html );
} while ( $html !== $tmp );

Regex to remove all empty HTML tags

Tags:

html

regex

php

Manhhailua

3 Answers

brandonscript

Casimir et Hippolyte

Alejandro Salamanca Mazuelo

Recent Activity

Donate For Us

Regex to remove all empty HTML tags

Tags:

html

regex

php

Manhhailua

3 Answers

brandonscript

Casimir et Hippolyte

Alejandro Salamanca Mazuelo

Related questions

Recent Activity

Donate For Us