Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP - Remove all content inside <script> and CDATA of HTML string

I should remove all content (and tags) between tags in a PHP string fetched from file_get_contents of a generic website URL. I'm using the RegEx expression:

preg_replace('/<script\b[^>]*>(.*?)<\/script>/is', "", $string);

It works fine, but my problem is that, if a script contains the CDATA tag, it won't work. An example of string would be:

<script type='text/javascript'>
/* <![CDATA[ */
var variable = {"ajax":"....."}
/* ]]> */
</script>

I guess that the problem is with those "/" and "/" tags.


I've already searched on google and on Stack Overflow, but ther is no question with that particular type of cdata tag (with /* and */), so nothing works.

Any suggestion?

Edit: As Steve answered, i am now using a code like this:

foreach($dom->getElementsByTagName('script') as $scripttag){
$scripttag->parentNode->removeChild($scripttag);
}

And then i have:

foreach($dom->getElementsByTagName('ins') as $string) {
    $string2 .= $string->nodeValue;
    $string2 .= ' ';
}

But that returns a $string2 with script tags inside.

EDIT 2 (SOLVED): With Steve's help, I found out that using Xpath solves the problem:

$xpath = new DOMXpath($dom);
foreach ($xpath->query('//script') as $node) {
   $node->parentNode->removeChild($node);
}

That removes script tags also inside another tag, for example:

<ins><script>First JS</script></ins>
<ins>Hello</ins>
<script>Second JS</script>

Will output

Hello

Thank you all for the help!

like image 282
Tek Litto Avatar asked Feb 04 '26 00:02

Tek Litto


1 Answers

Dont use regex for this, use a proper html parser like domdocument:

$dom = new DOMDocument('1.0', 'utf-8');
$dom->loadHTML($html);
//removing elements from a nodelist resets the internal pointer, so traverse backwards:
$elements = $dom->getElementsByTagName('script');
$count = $elements->length;
while(--$count){
    $elements->item($count)->parentNode->removeChild($elements->item($count));
}

//you can do further dom manipulation here if needed
$insertContents='';
foreach($dom->getElementsByTagName('ins') as $insert){
    $insertContents .= $insert->nodeValue . ' ';
}
//if you need the complete html at all:
$html = $dom->saveHTML();
//your desired string:
echo $insertContents;
like image 115
Steve Avatar answered Feb 06 '26 13:02

Steve



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!