Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strip HTML tags and its contents

I'm using DOM to parse string. I need function that strips span tags and its contents. For example, if I have:

This is some text that contains photo.
<span class='title'> photobyile</span>

I would like function to return

This is some text that contains photo.

This is what I tried:

    $dom = new domDocument;
    $dom->loadHTML($string);
    $dom->preserveWhiteSpace = false;
    $spans = $dom->getElementsByTagName('span');

    foreach($spans as $span)
    {
        $naslov = $span->nodeValue; 
        echo $naslov;

        $string = preg_replace("/$naslov/", " ", $string);
    }

I'm aware that $span->nodeValue returns value of span tag and not whole tag, but I don't know how to get whole tag, together with class name.

Thanks, Ile

like image 435
ilija veselica Avatar asked Oct 04 '09 10:10

ilija veselica


People also ask

What is strip in HTML?

Definition and Usage The strip_tags() function strips a string from HTML, XML, and PHP tags. Note: HTML comments are always stripped. This cannot be changed with the allow parameter. Note: This function is binary-safe.


2 Answers

Try removing the spans directly from the DOM tree.

$dom = new DOMDocument();
$dom->loadHTML($string);
$dom->preserveWhiteSpace = false;

$elements = $dom->getElementsByTagName('span');
while($span = $elements->item(0)) {       
   $span->parentNode->removeChild($span);
}

echo $dom->saveHTML();
like image 129
Lukáš Lalinský Avatar answered Sep 21 '22 05:09

Lukáš Lalinský


@ile - I've had that problem - it's because the index of the foreach iterator happily keeps incrementing, while calling removeChild() on the DOM also seems to remove the nodes from the DomNodeList ($spans). So for every span you remove, the nodelist shrinks one element and then gets its foreach counter incremented by one. Net result: it skips one span.

I'm sure there is a more elegant way, but this is how I did it - I moved the references from the DomNodeList to a second array, where they would not be removed by the removeChild() operation.

    foreach($spans as $span) {
        $nodes[] = $span;
    }
    foreach($nodes as $span) {
        $span->parentNode->removeChild($span);
    }
like image 35
kander Avatar answered Sep 19 '22 05:09

kander