Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

which one should performe first ? sanitizing or validation

i have a field in my registration form that contains for instance a name field,it will be stored in database in a field called user_name varchar(20). it's clear that i should validate the user input if i validate this field frist with code below:

<?php
 if(emptiy($_pos['name']) || strlen($_post['name'])>20)
 //send an not valid input error
 else{
 $name=htmlspcialchars($_post['name']);
 //check for sql injection;
 //insert name into database;}
?>

if a user insert a name like <i> some one </i> the string length is 17 so the else part will performe and name will be &lt;i&gt some one &lt;/i&gt; which the length is 28 that will produce an error while inserting to db.in this time if i send an error to user that his/her input is too longe he will got confused. what should i do? what is the best approach?

like image 556
naazanin Avatar asked Oct 18 '13 14:10

naazanin


1 Answers

In general one should sanitize first - "for your protection, and theirs." This includes stripping out any invalid characters (character coding sensitive, of course). If a field should only contain characters and spaces, then strip out anything that isn't that first.

With that done, you then validate the results - is the name already used (for unique fields), is it the right size, is it not blank?

The reason you give is precisely the right one - to maximize the user experience. Don't confuse the user, if you can avoid it. This helps protect from dumb copy & paste behavior, but you have to be careful - if I want my name recorded as "Ke$h@", I may or may not be ok with changing it to "Keh".

Secondly, it is also to prevent bugs.

What happens when you want to create usernames that don't allow special characters? If I enter "Brian", and your system rejects it as the name us already in use, then I submit "Brian$"? First you validate it, and it is not in use, then you strip special characters and you are left with "Brian". Uh oh - now you either have to validate AGAIN, or you'll get a weird error that either account creation failed (if your database is set to require unique usernames, for instance), or worse it will succeed and over-writing/corruption occurs to user user accounts.

Another example is minimum field lengths: if you require a name be at least 3 letters long and only accept letters, and I enter "no" you'd reject it; but if I enter "no@#$%" you would might say it was valid (long enough), sanitize it, and now it isn't valid anymore, etc.

The easy way to avoid this is sanitize first, and then you don't have to double-think about validation.

However, Niet was right about not encoding data before storage; it is generally much easier to setup output into HTML as being encoded when appropriate, then it is to remember to decode it when you just want the plain text (to entry into text boxes, JSON strings, etc). Most test cases you'll use won't include data with HTML entities, so its easy to introduce silly bugs that aren't easily caught.

The big problem is that when such a bug is introduced, it can quickly lead to data corruption that is not easily solved. Example: you have plain text, output it to a text field incorrectly as html entities, the form gets submitted back and you re-encode it...every time it gets opened/resubmitted it gets re-encoded. With a busy site/form you could end up with thousands of differently encoded entries, with no clear way to determine what should and what was not intended to be HTML encoded.

Protecting from injection is good, but HTML encoding isn't designed (and must not be relied upon) to do that.

like image 152
BrianH Avatar answered Oct 15 '22 13:10

BrianH