Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are these two functions overkill for sanitization?

function sanitizeString($var)
{
    $var = stripslashes($var);
    $var = htmlentities($var);
    $var = strip_tags($var);
    return $var;
}

function sanitizeMySQL($var)
{
    $var = mysql_real_escape_string($var);
    $var = sanitizeString($var);
    return $var;
}

I got these two functions from a book and the author says that by using these two, I can be extra safe against XSS(the first function) and sql injections(2nd func). Are all those necessary?

Also for sanitizing, I use prepared statements to prevent sql injections.

I would use it like this:

$variable = sanitizeString($_POST['user_input']);
$variable = sanitizeMySQL($_POST['user_input']);

EDIT: Get rid of strip_tags for the 1st function because it doesn't do anything. Would using these two functions be enough to prevent the majority of attacks and be okay for a public site?

like image 961
jpjp Avatar asked May 30 '10 19:05

jpjp


3 Answers

To be honest, I think the author of these function has either no idea what XSS and SQL injections are or what exactly the used function do.

Just to name two oddities:

  • Using stripslashes after mysql_real_escape_string removes the slashes that were added by mysql_real_escape_string.
  • htmlentities replaces the chatacters < and > that are used in strip_tags in order to identify tags.

Furthermore: In general, functions that protect agains XSS are not suitable to protect agains SQL injections and vice versa. Because each language and context hast its own special characters that need to be taken care of.

My advice is to learn why and how code injection is possible and how to protect against it. Learn the languages you are working with, especially the special characters and how to escape these.


Edit   Here’s some (probably weird) example: Imagine you allow your users to input some value that should be used as a path segment in a URI that you use in some JavaScript code in a onclick attribute value. So the language context looks like this:

  • HTML attribute value
    • JavaScript string
      • URI path segment

And to make it more fun: You are storing this input value in a database.

Now to store this input value correctly into your database, you just need to use a proper encoding for the context you are about to insert that value into your database language (i.e. SQL); the rest does not matter (yet). Since you want to insert it into a SQL string declaration, the contextual special characters are the characters that allow you to change that context. As for string declarations these characters are (especially) the ", ', and \ characters that need to be escaped. But as already stated, prepared statements do all that work for you, so use them.

Now that you have the value in your database, we want to output them properly. Here we proceed from the innermost to the outermost context and apply the proper encoding in each context:

  • For the URI path segment context we need to escape (at least) all those characters that let us change that context; in this case / (leave current path segment), ?, and # (both leave URI path context). We can use rawurlencode for this.
  • For the JavaScript string context we need to take care of ", ', and \. We can use json_encode for this (if available).
  • For the HTML attribute value we need to take care of &, ", ', and <. We can use htmlspecialchars for this.

Now everything together:

'… onclick="'.htmlspecialchars('window.open("http://example.com/'.json_encode(rawurlencode($row['user-input'])).'")').'" …'

Now if $row['user-input'] is "bar/baz" the output is:

… onclick="window.open(&quot;http://example.com/&quot;%22bar%2Fbaz%22&quot;&quot;)" …

But using all these function in these contexts is no overkill. Because although the contexts may have similar special characters, they have different escape sequences. URI has the so called percent encoding, JavaScript has escape sequences like \" and HTML has character references like &quot;. And not using just one of these functions will allow to break the context.

like image 95
Gumbo Avatar answered Nov 03 '22 01:11

Gumbo


It's true, but this level of escaping may not be appropriate in all cases. What if you want to store HTML in a database?

Best practice dictates that, rather than escaping on receiving values, you should escape them when you display them. This allows you to account for displaying both HTML from the database and non-HTML from the database, and it's really where this sort of code logically belongs, anyway.

Another advantage of sanitizing outgoing HTML is that a new attack vector may be discovered, in which case sanitizing incoming HTML won't do anything for values that are already in the database, while outgoing sanitization will apply retroactively without having to do anything special

Also, note that strip_tags in your first function will likely have no effect, if all of the < and > have become &lt; and &gt;.

like image 20
Matchu Avatar answered Nov 03 '22 01:11

Matchu


You are doing htmlentities (which turns all > into &gt;) and then calling strip_tags which at this point will not accomplish anything more, since there are no tags.

like image 43
Mitch Dempsey Avatar answered Nov 02 '22 23:11

Mitch Dempsey