Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is FILTER_SANITIZE_EMAIL pointless if already using FILTER_VALIDATE_EMAIL?

Tags:

php

xss

I am just creating a registration form, and I am looking only to insert valid and safe emails into the database.

Several sites (including w3schools) recommend running FILTER_SANITIZE_EMAIL before running FILTER_VALIDATE_EMAIL to be safe; however, this could change the submitted email from an invalid into a valid email, which could not be what the user wanted, for example:

The user has the email address [email protected], but accidentally inserts jeff"@gmail.com.

FILTER_SANITIZE_EMAIL would remove the " making the email [email protected] which FILTER_VALIDATE_EMAIL would say is valid even though it's not the user's actual email address.

To avoid this problem, I plan only to run FILTER_VALIDATE_EMAIL. (assuming I don't intend to output/process any emails declared invalid)

This will tell me whether or not the email is valid. If it is then there should be no need to pass it through FILTER_SANITIZE_EMAIL because any illegal/unsafe characters, would've already caused the email to be returned invalid, correct?

I also don't know of any email approved as valid by FILTER_VALIDATE_EMAIL that could be used for injection/xss due to the fact that white spaces, parentheses () and semicolons would invalidate the email. Or am I wrong?

(note: I will be using prepared statements to insert the data in addition to this, I just wanted to clear this up)

like image 399
Alex Avatar asked Sep 03 '11 01:09

Alex


People also ask

What does FILTER_ SANITIZE_ email do?

Definition and Usage The FILTER_SANITIZE_EMAIL filter removes all illegal characters from an email address.

What is the difference between validate and sanitize in PHP?

What's the difference between the two? Sanitizing will remove any illegal character from the data. Validating will determine if the data is in proper form.

How sanitize URL in PHP?

We can sanitize a URL by using FILTER_SANITIZE_URL. This function removes all chars except letters, digits and $-_. +! *'(),{}|\\^~[]`<>#%";/?:@&=.


2 Answers

Here's how to insert only valid emails.

<?php $original_email = 'jeff"@gmail.com';  $clean_email = filter_var($original_email,FILTER_SANITIZE_EMAIL);  if ($original_email == $clean_email && filter_var($original_email,FILTER_VALIDATE_EMAIL)){    // now you know the original email was safe to insert.    // insert into database code go here.  } 

FILTER_VALIDATE_EMAIL and FILTER_SANITIZE_EMAIL are both valuable functions and have different uses.

Validation is testing if the email is a valid format. Sanitizing is to clean the bad characters out of the email.

<?php $email = "[email protected]";  $clean_email = "";  if (filter_var($email,FILTER_VALIDATE_EMAIL)){     $clean_email =  filter_var($email,FILTER_SANITIZE_EMAIL); }   // another implementation by request. Which is the way I would suggest // using the filters. Clean the content and then make sure it's valid  // before you use it.   $email = "[email protected]";  $clean_email = filter_var($email,FILTER_SANITIZE_EMAIL);  if (filter_var($clean_email,FILTER_VALIDATE_EMAIL)){     // email is valid and ready for use } else {     // email is invalid and should be rejected } 

PHP is open source, so these questions are easily answered by just using it.

Source for FILTER_SANITIZE_EMAIL:

/* {{{ php_filter_email */ #define SAFE        "$-_.+" #define EXTRA       "!*'()," #define NATIONAL    "{}|\\^~[]`" #define PUNCTUATION "<>#%\"" #define RESERVED    ";/?:@&="  void php_filter_email(PHP_INPUT_FILTER_PARAM_DECL) {     /* Check section 6 of rfc 822 http://www.faqs.org/rfcs/rfc822.html */     const unsigned char allowed_list[] = LOWALPHA HIALPHA DIGIT "!#$%&'*+-=?^_`{|}~@.[]";     filter_map     map;      filter_map_init(&map);     filter_map_update(&map, 1, allowed_list);     filter_map_apply(value, &map); }     

Source for FILTER_VALIDATE_EMAIL:

void php_filter_validate_email(PHP_INPUT_FILTER_PARAM_DECL) /* {{{ */ { const char regexp[] = "/^(?!(?:(?:\\x22?\\x5C[\\x00-\\x7E]\\x22?)|(?:\\x22?[^\\x5C\\x22]\\x22?)){255,})(?!(?:(?:\\x22?\\x5C[\\x00-\\x7E]\\x22?)|(?:\\x22?[^\\x5C\\x22]\\x22?)){65,}@)(?:(?:[\\x21\\x23-\\x27\\x2A\\x2B\\x2D\\x2F-\\x39\\x3D\\x3F\\x5E-\\x7E]+)|(?:\\x22(?:[\\x01-\\x08\\x0B\\x0C\\x0E-\\x1F\\x21\\x23-\\x5B\\x5D-\\x7F]|(?:\\x5C[\\x00-\\x7F]))*\\x22))(?:\\.(?:(?:[\\x21\\x23-\\x27\\x2A\\x2B\\x2D\\x2F-\\x39\\x3D\\x3F\\x5E-\\x7E]+)|(?:\\x22(?:[\\x01-\\x08\\x0B\\x0C\\x0E-\\x1F\\x21\\x23-\\x5B\\x5D-\\x7F]|(?:\\x5C[\\x00-\\x7F]))*\\x22)))*@(?:(?:(?!.*[^.]{64,})(?:(?:(?:xn--)?[a-z0-9]+(?:-+[a-z0-9]+)*\\.){1,126}){1,}(?:(?:[a-z][a-z0-9]*)|(?:(?:xn--)[a-z0-9]+))(?:-+[a-z0-9]+)*)|(?:\\[(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){7})|(?:(?!(?:.*[a-f0-9][:\\]]){7,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?)))|(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){5}:)|(?:(?!(?:.*[a-f0-9]:){5,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3}:)?)))?(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))(?:\\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))){3}))\\]))$/iD";  pcre       *re = NULL; pcre_extra *pcre_extra = NULL; int preg_options = 0; int         ovector[150]; /* Needs to be a multiple of 3 */ int         matches;   /* The maximum length of an e-mail address is 320 octets, per RFC 2821. */ if (Z_STRLEN_P(value) > 320) {     RETURN_VALIDATION_FAILED }  re = pcre_get_compiled_regex((char *)regexp, &pcre_extra, &preg_options TSRMLS_CC); if (!re) {     RETURN_VALIDATION_FAILED } matches = pcre_exec(re, NULL, Z_STRVAL_P(value), Z_STRLEN_P(value), 0, 0, ovector, 3);  /* 0 means that the vector is too small to hold all the captured substring offsets */ if (matches < 0) {     RETURN_VALIDATION_FAILED }  } 
like image 168
jbrahy Avatar answered Sep 18 '22 15:09

jbrahy


The "proper" way of doing this is asking for the user's email two times (which is common/good practice). But to answer your question, FILTER_SANITIZE_EMAIL is not pointless. It's a filter that sanitizes emails and it does its job well.

You need to understand that a filter that validates either returns true or false whereas a filter that sanitizes actually modifies the given variable. The two do not serve the same purpose.

like image 35
David Titarenco Avatar answered Sep 18 '22 15:09

David Titarenco