Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

filter_var versus preg_match

Tags:

regex

php

Morning all

I'm converting a site that I'm working on to be compliant with the latest version of PHP, so I'm going through and replacing all instances of ereg with their non-depreciated equivalent. However I was told about a handy built-in function with PHP called filter_var.

What my question is, is would it make sense to go with filter_var over preg_match? As in is there a performance boost or any other benefits to choosing one over the other, and if so what are they?

like image 525
canadiancreed Avatar asked Oct 09 '09 14:10

canadiancreed


3 Answers

First of all, the PHP Manual page on filtering: https://php.net/manual/en/book.filter.php

Second, context is key. Generally speaking, filter functions are designed to use external input (scalars or arrays), or internal input. External input comes from sources like an an HTTP request / PHP engine, or a form submission.

Filter functions with the filter_input prefix allow you to bypass $_SERVER, $_COOKIE, $_POST, and $_GET superglobals entirely. Although you generally specify "where" you want the data from, filter functions do not explicitly utilize $_POST, $_GET, $_COOKIE, and $_SERVER. Changes you make to the variable/array elements will not show up in $_GET, $_POST, or $_SERVER, so using filter this way is a paradigm shift and may change the flow of your application significantly. In other words, you have to track the external input yourself. I do this for initial sanitizing (stripping, replacing, altering, etc...) of external input. I no longer use $_POST, $_GET, or $_SERVER at all. Although, I do still use $_FILES.

Functions prefixed with filter_var are for filtering any general array that already exists within your program. I use this after having used filter_input. There are many filters you can use in both cases, but your question is about performance.

If you chose to use the FILTER_VALIDATE_REGEXP filter with any of the filtering functions, I cannot imagine this indirect approach being more efficient than directly using preg_match(). As far as the other filters go, if they are simply 'n' number of methods/functions removed from a regular expression call, I cannot see an improvement in efficiency there either.

I see the filter functions as something that were designed to help improve consistency for filtering tasks that happen across many applications. They are probably not designed to be more efficient, but they are definitely designed to be more accessible than regular expressions (though I am very good with regular expressions). I prefer having direct knowledge of what's happening, but some people don't or could care less. However, the filter functions open the door to filtering strings to those who don't understand regular expressions and other basic web application security processes.

One can certainly live without using the filter functions, though.

What's more, I use the filter functions in conjunction with my own sanitizer and validator classes. So, I'm not asking PHP to think for me, I'm just using it to augment what I already know how to do (just in case their functions get something I miss). Defense in depth.

In summary, your best bet is simply to use preg_match(), unless you intend on changing the flow (filter_input functions) of input into your application. Even then, there won't be a performance boost, but you can bypass $_SERVER, $_POST, and $_GET. Also, you can take advantage of simpler, structured, consistent, filtering functionality with the ability to use a callback function (FILTER_CALLBACK) to call custom, in house, methods/functions (which I do as well). Also, you can still use your own regular expressions with the filter functions using the FILTER_VALIDATE_REGEXP filter, but again, I see no reason to believe that the performance of your application will improve if you do. Maintainability? Maybe. It depends on the person writing the code.

like image 195
Anthony Rutledge Avatar answered Oct 01 '22 23:10

Anthony Rutledge


filter_var — Filters a variable with a specified filter
preg_match — Perform a regular expression match

I guess use could use filter_var to filter variables but as a replacement for preg_match I don't think is a good idea for upgrading from ereg as filter_var doesn't use regex and you would have to rewrite a lot of the functionality/logic to do this.

like image 42
Phill Pafford Avatar answered Oct 02 '22 00:10

Phill Pafford


Switching over to use filter_var() would be a great idea actually. You wouldn't be able to use your existing regular expressions, however you WOULD be able to eliminate them entirely. Often, the regex we use in our apps are simply used for simple validations and filtering, which is exactly what the filter_var() function is intended for.

For example, in your code, you may already have:

if (eregi('\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b', $_POST['email'])) {
    echo "valid";
}

This could be replaced by the prettier version (not relying on custom regular expressions):

if (filter_var($_POST['email'], FILTER_VALIDATE_EMAIL)) {
    echo "valid";
}

The filter_var() function also has the ability to sanitize out characters which aren't needed by the particular data you're examining, and would return the cleaned string (instead of a boolean):

$clean = filter_var($_POST['email'], FILTER_SANITIZE_EMAIL);

This kind of usage with filter_var() would replace ereg_replace() type functions.

However, for the simplest of upgrades, you can just "prefix" the ereg*() family of functions with a 'p', which makes them PCRE compliant (and therefore no longer deprecated in PHP 5.3+).

like image 22
Thomas Hunter II Avatar answered Oct 02 '22 00:10

Thomas Hunter II