Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When to use filter_input()

Tags:

security

php

This question was originally asked in a comment here.

Is filter_input() still necessary if you’re using parameterized queries and htmlspecialchars() before you print any user-supplied data?

It seems unnecessary to me, but I've always been told to "Filter Input, Escape Output". So, aside from a database (or another form of storage), is there any need to filter inputted data?

like image 664
Jonathan Avatar asked Feb 27 '13 01:02

Jonathan


People also ask

What is filter_input?

The filter_input() is an inbuilt function in PHP which is used to get the specific external variable by name and filter it. This function is used to validate variables from insecure sources, such as user input from form.

What is the use of the Filter_var () and filter_input () functions in PHP?

filter_var. If a variable doesn't exist, the filter_input() function returns null while the filter_var() function returns an empty string and issues a notice of an undefined index.

Why filter are used in PHP?

PHP filters are used to validate and sanitize external input. The PHP filter extension has many of the functions needed for checking user input, and is designed to make data validation easier and quicker.


2 Answers

Well, there are going to be differing opinions.

My take is that you should always use it (or, the filter extension in general). There are at least 3 reasons for this:

  1. Sanitizing input is something you should always do. Since the function gives you this capability there is really no reason to find other ways of sanitizing input. Since it is an extension the filter will also be much faster and most likely safer than most PHP solutions out there, which certainly does not hurt. The only exception is if you need a more specialized filter. Even then you should grab the value using the FILTER_UNSAFE_RAW filter (see #3).

  2. There are a lot of goodies in the filter extension. It can save you hours from writing sanitizing and validation code. Of course, it does not cover every single case, but there is enough so that you can focus more on specific filtering/validating code.

  3. Using the function is very good for when you are debugging/auditing your code. When the function is used you know exactly what the input will be. For example, if you use the FILTER_SANITIZE_NUMBER_INT filter then you can be sure that the input will be a number -- no SQL injections, no HTML or Javascript code, etc. If you, on the other hand, use something like FILTER_UNSAFE_RAW then you know that it should be treated carefully, and that it can easily cause security problems.

like image 64
Sverri M. Olsen Avatar answered Oct 11 '22 12:10

Sverri M. Olsen


As Sverri M. Olsen says, there are differing opinions on this.

I agree very much with the philosophy Filter Input, Escape Output.

Is filter_input() still necessary if you’re using parameterized queries and htmlspecialchars() before you print any user-supplied data?

Short answer: IMO, No. It's not necessary, but can be useful in some cases.


The filter_input function has many useful filters, and I do use some of them (i.e. FILTER_VALIDATE_EMAIL). The validate filters are useful for validating input. However, IMO, the ones that transform data should only be used on output.

Some people encourage escaping input. Indeed, the examples given on the filter_input manual page seem to encourage this as well.

$search_html = filter_input(INPUT_GET, 'search', FILTER_SANITIZE_SPECIAL_CHARS);
$search_url = filter_input(INPUT_GET, 'search', FILTER_SANITIZE_ENCODED);

The only examples are for escaping. That combined with the name of the function (filter_input) seems to suggest that escaping input is good practice. Escaping is necessary, but, IMO, should be done before output, not on input. At least the return values are being stored in appropriately named variables.

I strongly disagree with escaping input. I've already come across real world situations where transforming data too early is a problem.

For example, Google Analytics processes input in such a way that is causing my encoded ampersands (%26) to be decoded prior to query parameters being excluded. The result is that I have stats for query parameters that actually don't even exist in my URLs. See my question regarding this issue that remains unsolved.

You may also want to read Why escape-on-input is a bad idea. Here are some excerpts that I agree with, just in case the article disappears [emphasis in the original].

[...] escape-on-input is just wrong [...] it is a layering violation — it mixes an output formatting concern into input handling. Layering violations make your code much harder to understand and maintain, because you have to take into account other layers instead of letting each component and layer do its own job.

and

You have corrupted your data by default. The system [...] is now lying about what data has come in.

and

Escaping on input will not only fail to deal with the problems of more than one output, it will actually make your data incorrect for many outputs.

and

PHP used to have a feature called magic quotes. It was an escape-on-input feature that [...] caused all kinds of problems. [...] According to Lerdorf, the much newer PHP 'filter' extension is "magic_quotes done right". But it still suffers from almost all the problems described here.

So how is the filter extension better than magic quotes (other than the fact that it has many different filters)? The filters cause many of the same issues that magic quotes did.


Here are the coding conventions I use:

  • values in $_POST, $_GET, $_REQUEST, etc. should not be escaped and should always be considered unsafe
  • values should be validated1 before being written to database or stored in $_SESSION
  • values expected to be numeric or boolean should be sanitized2 before being written to database or stored in $_SESSION
  • trust that numeric and boolean values from database and $_SESSION are indeed numeric or boolean
  • string values should be SQL-escaped before being used directly in any SQL query (non-string values should be sanitized2) or use prepared statements
  • string values should be HTML-escaped before being used in HTML output (non-string values should be sanitized2)
  • string values should be percent-encoded before being used in query strings (non-string values should be sanitized2)
  • use a variable naming convention (such as *_url, *_html, *_sql) to store transformed data

Terminology

For my purposes here, this is how I define the terms used above.

  1. to validate means to confirm any assumptions being made about the data such as having a specific format or required fields having a value
  2. to sanitize means to confirm values are exactly as expected (i.e. $id_num should contain nothing but digits)

Summary

In general (there may be some exceptions), I'd recommend the following:

  • use validate filters on input
  • use sanitize filters on output
  • remember TIMTOWDI - For example, I prefer htmlspecialchars() (which has more options) over FILTER_SANITIZE_FULL_SPECIAL_CHARS or FILTER_SANITIZE_SPECIAL_CHARS (which escapes line breaks)
like image 21
toxalot Avatar answered Oct 11 '22 14:10

toxalot