Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove empty html tags (which contain whitespaces and/or their html codes)

Need a regex for preg_replace.

This question wasn't answered in "another question" because not all tags I want to remove aren't empty.

I have not only to remove empty tags from an HTML structure, but also tags containing line breaks as well as white spaces and/or their html code.

Possible Codes are:

<br /> &nbsp; &thinsp; &ensp; &emsp; &#8201; &#8194; &#8195;

BEFORE removing matching tags:

<div> 
  <h1>This is a html structure.</h1> 
  <p>This is not empty.</p> 
  <p></p> 
  <p><br /></p>
  <p> <br /> &;thinsp;</p>
  <p>&nbsp;</p> 
  <p> &nbsp; </p> 
</div>

AFTER removing matching tags:

<div> 
  <h1>This is a html structure.</h1> 
  <p>This is not empty.</p> 
</div>
like image 557
Ditte Berlin Avatar asked Feb 09 '23 20:02

Ditte Berlin


2 Answers

You can use the following:

<([^>\s]+)[^>]*>(?:\s*(?:<br \/>|&nbsp;|&thinsp;|&ensp;|&emsp;|&#8201;|&#8194;|&#8195;)\s*)*<\/\1>

And replace with '' (empty string)

See DEMO

Note: This will also work for empty html tags with attributes.

like image 104
karthik manchala Avatar answered Feb 12 '23 11:02

karthik manchala


Use tidy It uses the following function:

function cleaning($string, $tidyConfig = null) {
    $out = array ();
    $config = array (
            'indent' => true,
            'show-body-only' => false,
            'clean' => true,
            'output-xhtml' => true,
            'preserve-entities' => true 
    );
    if ($tidyConfig == null) {
        $tidyConfig = &$config;
    }
    $tidy = new tidy ();
    $out ['full'] = $tidy->repairString ( $string, $tidyConfig, 'UTF8' );
    unset ( $tidy );
    unset ( $tidyConfig );
    $out ['body'] = preg_replace ( "/.*<body[^>]*>|<\/body>.*/si", "", $out ['full'] );
    $out ['style'] = '<style type="text/css">' . preg_replace ( "/.*<style[^>]*>|<\/style>.*/si", "", $out ['full'] ) . '</style>';
    return ($out);
}
like image 33
Identity1 Avatar answered Feb 12 '23 11:02

Identity1