Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When is it best to sanitize user input?

User equals untrustworthy. Never trust untrustworthy user's input. I get that. However, I am wondering when the best time to sanitize input is. For example, do you blindly store user input and then sanitize it whenever it is accessed/used, or do you sanitize the input immediately and then store this "cleaned" version? Maybe there are also some other approaches I haven't though of in addition to these. I am leaning more towards the first method, because any data that came from user input must still be approached cautiously, where the "cleaned" data might still unknowingly or accidentally be dangerous. Either way, what method do people think is best, and for what reasons?

like image 988
Aaron Avatar asked Aug 29 '08 18:08

Aaron


People also ask

What is sanitization of user input?

Input sanitization is a cybersecurity measure of checking, cleaning, and filtering data inputs from users, APIs, and web services of any unwanted characters and strings to prevent the injection of harmful codes into the system.

Why do we Sanitise data?

When a company's IT assets reach the end of their useful life, they must be sanitized to ensure sensitive data stored on the equipment is really erased, before disposing or reusing it. The most common scenario for data sanitization is re-imaging. This usually happens when equipment is reassigned to new users.

Why should we sanitize in the client?

Client side sanitation/validation should be used for few reasons: easier and faster way to tell the non-malicious user what he did wrong. decrease the number of times non-malicious user communicate with your server (in case of errors)


1 Answers

Unfortunately, almost no one of the participants ever clearly understands what are they talking about. Literally. Only Kibbee managed to make it straight.

This topic is all about sanitization. But the truth is, such a thing like wide-termed "general purpose sanitization" everyone is so eager to talk about is just doesn't exist.

There are a zillion different mediums, each require it's own, distinct data formatting. Moreover - even single certain medium require different formatting for it's parts. Say, HTML formatting is useless for javascript embedded in HTML page. Or, string formatting is useless for the numbers in SQL query.

As a matter of fact, such a "sanitization as early as possible", as suggested in most upvoted answers, is just impossible. As one just cannot tell in which certain medium or medium part the data will be used. Say, we are preparing to defend from "sql-injection", escaping everything that moves. But whoops! - some required fields weren't filled and we have to fill out data back into form instead of database... with all the slashes added.

On the other hand, we diligently escaped all the "user input"... but in the sql query we have no quotes around it, as it is a number or identifier. And no "sanitization" ever helped us.

On the third hand - okay, we did our best in sanitizing the terrible, untrustworthy and disdained "user input"... but in some inner process we used this very data without any formatting (as we did our best already!) - and whoops! have got second order injection in all its glory.

So, from the real life usage point of view, the only proper way would be

  • formatting, not whatever "sanitization"
  • right before use
  • according to the certain medium rules
  • and even following sub-rules required for this medium's different parts.
like image 141
Your Common Sense Avatar answered Sep 28 '22 06:09

Your Common Sense