I have a variable named $articleText
and it contains html code. There are script
and style
codes within <script>
and <style>
html elements. I want to scan the $articleText
and remove these pieces of code. If I can also remove the actual html elements <script>
, </script>
, <style>
and </style>
, I would do that too.
I imagine I need to be using regex however I am not skilled in it.
Can anyone assist?
I wish I could provide some code but like I said I am not skilled in regex so I don't have anything to show.
Do not use RegEx on HTML. PHP provides a tool for parsing DOM structures, called appropriately DomDocument.
<?php
// some HTML for example
$myHtml = '<html><head><script>alert("hi mom!");</script></head><body><style>body { color: red;} </style><h1>This is some content</h1><p>content is awesome</p></body><script src="someFile.js"></script></html>';
// create a new DomDocument object
$doc = new DOMDocument();
// load the HTML into the DomDocument object (this would be your source HTML)
$doc->loadHTML($myHtml);
removeElementsByTagName('script', $doc);
removeElementsByTagName('style', $doc);
removeElementsByTagName('link', $doc);
// output cleaned html
echo $doc->saveHtml();
function removeElementsByTagName($tagName, $document) {
$nodeList = $document->getElementsByTagName($tagName);
for ($nodeIdx = $nodeList->length; --$nodeIdx >= 0; ) {
$node = $nodeList->item($nodeIdx);
$node->parentNode->removeChild($node);
}
}
You can try it here: https://eval.in/private/4f225fa0dcb4eb
Documentation
DomDocument
- http://php.net/manual/en/class.domdocument.php
DomNodeList
- http://php.net/manual/en/class.domnodelist.php
DomDocument::getElementsByTagName
- http://us3.php.net/manual/en/domdocument.getelementsbytagname.php
Even regex is not a good tool for this kind of task, for small simple task it may work.
If you want to remove just inner text of tag(s), use:
preg_replace('/(<(script|style)\b[^>]*>).*?(<\/\2>)/is', "$1$3", $txt);
See demo here.
If you want to remove also tags, replacement string in the above code would be empty, so just ""
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With