This question was originally asked in a comment here.
Is filter_input() still necessary if you’re using parameterized queries and htmlspecialchars() before you print any user-supplied data?
It seems unnecessary to me, but I've always been told to "Filter Input, Escape Output". So, aside from a database (or another form of storage), is there any need to filter inputted data?
The filter_input() is an inbuilt function in PHP which is used to get the specific external variable by name and filter it. This function is used to validate variables from insecure sources, such as user input from form.
filter_var. If a variable doesn't exist, the filter_input() function returns null while the filter_var() function returns an empty string and issues a notice of an undefined index.
PHP filters are used to validate and sanitize external input. The PHP filter extension has many of the functions needed for checking user input, and is designed to make data validation easier and quicker.
Well, there are going to be differing opinions.
My take is that you should always use it (or, the filter
extension in general). There are at least 3 reasons for this:
Sanitizing input is something you should always do. Since the function gives you this capability there is really no reason to find other ways of sanitizing input. Since it is an extension the filter will also be much faster and most likely safer than most PHP solutions out there, which certainly does not hurt. The only exception is if you need a more specialized filter. Even then you should grab the value using the FILTER_UNSAFE_RAW
filter (see #3).
There are a lot of goodies in the filter
extension. It can save you hours from writing sanitizing and validation code. Of course, it does not cover every single case, but there is enough so that you can focus more on specific filtering/validating code.
Using the function is very good for when you are debugging/auditing your code. When the function is used you know exactly what the input will be. For example, if you use the FILTER_SANITIZE_NUMBER_INT
filter then you can be sure that the input will be a number -- no SQL injections, no HTML or Javascript code, etc. If you, on the other hand, use something like FILTER_UNSAFE_RAW
then you know that it should be treated carefully, and that it can easily cause security problems.
As Sverri M. Olsen says, there are differing opinions on this.
I agree very much with the philosophy Filter Input, Escape Output.
Is filter_input() still necessary if you’re using parameterized queries and htmlspecialchars() before you print any user-supplied data?
Short answer: IMO, No. It's not necessary, but can be useful in some cases.
The filter_input
function has many useful filters, and I do use some of them (i.e. FILTER_VALIDATE_EMAIL). The validate filters are useful for validating input. However, IMO, the ones that transform data should only be used on output.
Some people encourage escaping input. Indeed, the examples given on the filter_input manual page seem to encourage this as well.
$search_html = filter_input(INPUT_GET, 'search', FILTER_SANITIZE_SPECIAL_CHARS);
$search_url = filter_input(INPUT_GET, 'search', FILTER_SANITIZE_ENCODED);
The only examples are for escaping. That combined with the name of the function (filter_input) seems to suggest that escaping input is good practice. Escaping is necessary, but, IMO, should be done before output, not on input. At least the return values are being stored in appropriately named variables.
I strongly disagree with escaping input. I've already come across real world situations where transforming data too early is a problem.
For example, Google Analytics processes input in such a way that is causing my encoded ampersands (%26) to be decoded prior to query parameters being excluded. The result is that I have stats for query parameters that actually don't even exist in my URLs. See my question regarding this issue that remains unsolved.
You may also want to read Why escape-on-input is a bad idea. Here are some excerpts that I agree with, just in case the article disappears [emphasis in the original].
[...] escape-on-input is just wrong [...] it is a layering violation — it mixes an output formatting concern into input handling. Layering violations make your code much harder to understand and maintain, because you have to take into account other layers instead of letting each component and layer do its own job.
and
You have corrupted your data by default. The system [...] is now lying about what data has come in.
and
Escaping on input will not only fail to deal with the problems of more than one output, it will actually make your data incorrect for many outputs.
and
PHP used to have a feature called magic quotes. It was an escape-on-input feature that [...] caused all kinds of problems. [...] According to Lerdorf, the much newer PHP 'filter' extension is "magic_quotes done right". But it still suffers from almost all the problems described here.
So how is the filter extension better than magic quotes (other than the fact that it has many different filters)? The filters cause many of the same issues that magic quotes did.
Here are the coding conventions I use:
Terminology
For my purposes here, this is how I define the terms used above.
Summary
In general (there may be some exceptions), I'd recommend the following:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With