When receiving user input on forms I want to detect whether fields like "username" or "address" does not contain markup that has a special meaning in XML (RSS feeds) or (X)HTML (when displayed).
So which of these is the correct way to detect whether the input entered doesn't contain any special characters in HTML and XML context?
if (mb_strpos($data, '<') === FALSE AND mb_strpos($data, '>') === FALSE)
or
if (htmlspecialchars($data, ENT_NOQUOTES, 'UTF-8') === $data)
or
if (preg_match("/[^\p{L}\-.']/u", $text)) // problem: also caches symbols
Have I missed anything else,like byte sequences or other tricky ways to get markup tags around things like "javascript:"? As far as I'm aware, all XSS and CSFR attacks require <
or >
around the values to get the browser to execute the code (well at least from Internet Explorer 6 or later anyway) - is this correct?
I am not looking for something to reduce or filter input. I just want to locate dangerous character sequences when used in XML or HTML context. (strip_tags()
is horribly unsafe. As the manual says, it doesn't check for malformed HTML.)
I think I need to clarify that there are a lot people mistaking this question for a question about basic security via "escaping" or "filtering" dangerous characters. This is not that question, and most of the simple answers given wouldn't solve that problem anyway.
if (mb_strpos($data, '<') === FALSE AND mb_strpos($data, '>') === FALSE)
Now that the data is in my application I do two things with it - 1) display in a format like HTML - or 2) display inside a format element for editing.
The first one is safe in XML and HTML context
<h2><?php print $input; ?></h2>'
<xml><item><?php print $input; ?></item></xml>
The second form is more dangerous, but it should still be safe:
<input value="<?php print htmlspecialchars($input, ENT_QUOTES, 'UTF-8');?>">
You can download the gist I created and run the code as a text or HTML response to see what I'm talking about. This simple check passes the http://ha.ckers.org XSS Cheat Sheet, and I can't find anything that makes it though. (I'm ignoring Internet Explorer 6 and below).
I started another bounty to award someone that can show a problem with this approach or a weakness in its implementation.
It's the DOM that we want to protect - so why not just ask it? Timur's answer lead to this:
function not_markup($string) { libxml_use_internal_errors(true); if ($xml = simplexml_load_string("<root>$string</root>")) { return $xml->children()->count() === 0; } } if (not_markup($_POST['title'])) ...
test. bind(/(<([^>]+)>)/i); It will basically return true for strings containing a < followed by ANYTHING followed by > .
HTML documents are strings that contain both content and markup. Content looks like: hi there and markup looks like <p> . In HTML they are blended together so that the string <p>hi there</p> tells the browser to display the words hi there to the screen in whatever a paragraph, according to the browser, looks like.
I don't think you need to implement a huge algorithm to check if string has unsafe data - filters and regular expressions do the work. But, if you need a more complex check, maybe this will fit your needs:
<?php $strings = array(); $strings[] = <<<EOD ';alert(String.fromCharCode(88,83,83))//\';alert(String.fromCharCode(88,83,83))//";alert(String.fromCharCode(88,83,83))//\";alert(String.fromCharCode(88,83,83))//--></SCRIPT>">'><SCRIPT>alert(String.fromCharCode(88,83,83))</SCRIPT> EOD; $strings[] = <<<EOD '';!--"<XSS>=&{()} EOD; $strings[] = <<<EOD <SCRIPT SRC=http://ha.ckers.org/xss.js></SCRIPT> EOD; $strings[] = <<<EOD This is a safe text EOD; $strings[] = <<<EOD <IMG SRC="javascript:alert('XSS');"> EOD; $strings[] = <<<EOD <IMG SRC=javascript:alert('XSS')> EOD; $strings[] = <<<EOD <IMG SRC=javascript:alert('XSS')> EOD; $strings[] = <<<EOD perl -e 'print "<IMG SRC=java\0script:alert(\"XSS\")>";' > out EOD; $strings[] = <<<EOD <SCRIPT/XSS SRC="http://ha.ckers.org/xss.js"></SCRIPT> EOD; $strings[] = <<<EOD </TITLE><SCRIPT>alert("XSS");</SCRIPT> EOD; libxml_use_internal_errors(true); $sourceXML = '<root><element>value</element></root>'; $sourceXMLDocument = simplexml_load_string($sourceXML); $sourceCount = $sourceXMLDocument->children()->count(); foreach( $strings as $string ){ $unsafe = false; $XML = '<root><element>'.$string.'</element></root>'; $XMLDocument = simplexml_load_string($XML); if( $XMLDocument===false ){ $unsafe = true; }else{ $count = $XMLDocument->children()->count(); if( $count!=$sourceCount ){ $unsafe = true; } } echo ($unsafe?'Unsafe':'Safe').': <pre>'.htmlspecialchars($string,ENT_QUOTES,'utf-8').'</pre><br />'."\n"; } ?>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With