I should remove all content (and tags) between tags in a PHP string fetched from file_get_contents of a generic website URL. I'm using the RegEx expression:
preg_replace('/<script\b[^>]*>(.*?)<\/script>/is', "", $string);
It works fine, but my problem is that, if a script contains the CDATA tag, it won't work. An example of string would be:
<script type='text/javascript'>
/* <![CDATA[ */
var variable = {"ajax":"....."}
/* ]]> */
</script>
I guess that the problem is with those "/" and "/" tags.
I've already searched on google and on Stack Overflow, but ther is no question with that particular type of cdata tag (with /* and */), so nothing works.
Any suggestion?
Edit:
As Steve answered, i am now using a code like this:
foreach($dom->getElementsByTagName('script') as $scripttag){
$scripttag->parentNode->removeChild($scripttag);
}
And then i have:
foreach($dom->getElementsByTagName('ins') as $string) {
$string2 .= $string->nodeValue;
$string2 .= ' ';
}
But that returns a $string2 with script tags inside.
EDIT 2 (SOLVED): With Steve's help, I found out that using Xpath solves the problem:
$xpath = new DOMXpath($dom);
foreach ($xpath->query('//script') as $node) {
$node->parentNode->removeChild($node);
}
That removes script tags also inside another tag, for example:
<ins><script>First JS</script></ins>
<ins>Hello</ins>
<script>Second JS</script>
Will output
Hello
Thank you all for the help!
Dont use regex for this, use a proper html parser like domdocument:
$dom = new DOMDocument('1.0', 'utf-8');
$dom->loadHTML($html);
//removing elements from a nodelist resets the internal pointer, so traverse backwards:
$elements = $dom->getElementsByTagName('script');
$count = $elements->length;
while(--$count){
$elements->item($count)->parentNode->removeChild($elements->item($count));
}
//you can do further dom manipulation here if needed
$insertContents='';
foreach($dom->getElementsByTagName('ins') as $insert){
$insertContents .= $insert->nodeValue . ' ';
}
//if you need the complete html at all:
$html = $dom->saveHTML();
//your desired string:
echo $insertContents;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With