Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP Remove JavaScript

Tags:

javascript

php

I am trying to remove JavaScript from the HTML.

I can't get the regular expression to work with PHP; it's giving me an null array. Why?

<?php
$var = '
<script type="text/javascript"> 
function selectCode(a) 
{ 
   var e = a.parentNode.parentNode.getElementsByTagName(PRE)[0]; 
   if (window.getSelection) 
   { 
      var s = window.getSelection(); 
       if (s.setBaseAndExtent) 
      { 
         s.setBaseAndExtent(e, 0, e, e.innerText.length - 1); 
      } 
      else 
      { 
         var r = document.createRange(); 
         r.selectNodeContents(e); 
         s.removeAllRanges(); 
         s.addRange(r); 
      } 
   } 
   else if (document.getSelection) 
   { 
      var s = document.getSelection(); 
      var r = document.createRange(); 
      r.selectNodeContents(e); 
      s.removeAllRanges(); 
      s.addRange(r); 
   } 
   else if (document.selection) 
   { 
      var r = document.body.createTextRange(); 
      r.moveToElementText(e); 
      r.select(); 
   } 
} 
</script>
';

   function remove_javascript($java){
   echo preg_replace('/<script\b[^>]*>(.*?)<\/script>/i', "", $java);

   }    
?>
like image 905
Saxtor Avatar asked Dec 11 '09 09:12

Saxtor


5 Answers

this should do it:

echo preg_replace('/<script\b[^>]*>(.*?)<\/script>/is', "", $var); 

/s is so that the dot . matches newlines too.

Just a warning, you should not use this type of regexp to sanitize user input for a website. There is just too many ways to get around it. For sanitizing use something like the http://htmlpurifier.org/ library

like image 61
Tjofras Avatar answered Nov 08 '22 22:11

Tjofras


This might do more than you want, but depending on your situation you might want to look at strip_tags.

like image 24
deceze Avatar answered Nov 09 '22 00:11

deceze


Here's an idea

while (true) {
  if ($beginning = strpos($var,"<script")) {
    $stringLength = (strpos($var,"</script>") + strlen("</script>")) - $beginning;
    substr_replace($var, "", $beginning, $stringLength);
  } else {
    break
  }
}
like image 24
bng44270 Avatar answered Nov 08 '22 22:11

bng44270


In your case you could regard the string as a list of newline delimited strings and remove the lines containing the script tags(first & second to last) and you wouldn't even need regular expressions.

Though if what you are trying to do is preventing XSS it might not be sufficient to only remove script tags.

like image 21
tosh Avatar answered Nov 08 '22 22:11

tosh


function clean_jscode($script_str) {
    $script_str = htmlspecialchars_decode($script_str);
    $search_arr = array('<script', '</script>');
    $script_str = str_ireplace($search_arr, $search_arr, $script_str);
    $split_arr = explode('<script', $script_str);
    $remove_jscode_arr = array();
    foreach($split_arr as $key => $val) {
        $newarr = explode('</script>', $split_arr[$key]);
        $remove_jscode_arr[] = ($key == 0) ? $newarr[0] : $newarr[1];
    }
    return implode('', $remove_jscode_arr);
}
like image 37
Soe Min Thu Avatar answered Nov 08 '22 22:11

Soe Min Thu