I am just creating a registration form, and I am looking only to insert valid and safe emails into the database.
Several sites (including w3schools) recommend running FILTER_SANITIZE_EMAIL before running FILTER_VALIDATE_EMAIL to be safe; however, this could change the submitted email from an invalid into a valid email, which could not be what the user wanted, for example:
The user has the email address [email protected], but accidentally inserts jeff"@gmail.com.
FILTER_SANITIZE_EMAIL would remove the " making the email [email protected] which FILTER_VALIDATE_EMAIL would say is valid even though it's not the user's actual email address.
To avoid this problem, I plan only to run FILTER_VALIDATE_EMAIL. (assuming I don't intend to output/process any emails declared invalid)
This will tell me whether or not the email is valid. If it is then there should be no need to pass it through FILTER_SANITIZE_EMAIL because any illegal/unsafe characters, would've already caused the email to be returned invalid, correct?
I also don't know of any email approved as valid by FILTER_VALIDATE_EMAIL that could be used for injection/xss due to the fact that white spaces, parentheses () and semicolons would invalidate the email. Or am I wrong?
(note: I will be using prepared statements to insert the data in addition to this, I just wanted to clear this up)
Definition and Usage The FILTER_SANITIZE_EMAIL filter removes all illegal characters from an email address.
What's the difference between the two? Sanitizing will remove any illegal character from the data. Validating will determine if the data is in proper form.
We can sanitize a URL by using FILTER_SANITIZE_URL. This function removes all chars except letters, digits and $-_. +! *'(),{}|\\^~[]`<>#%";/?:@&=.
Here's how to insert only valid emails.
<?php $original_email = 'jeff"@gmail.com'; $clean_email = filter_var($original_email,FILTER_SANITIZE_EMAIL); if ($original_email == $clean_email && filter_var($original_email,FILTER_VALIDATE_EMAIL)){ // now you know the original email was safe to insert. // insert into database code go here. }
FILTER_VALIDATE_EMAIL
and FILTER_SANITIZE_EMAIL
are both valuable functions and have different uses.
Validation is testing if the email is a valid format. Sanitizing is to clean the bad characters out of the email.
<?php $email = "[email protected]"; $clean_email = ""; if (filter_var($email,FILTER_VALIDATE_EMAIL)){ $clean_email = filter_var($email,FILTER_SANITIZE_EMAIL); } // another implementation by request. Which is the way I would suggest // using the filters. Clean the content and then make sure it's valid // before you use it. $email = "[email protected]"; $clean_email = filter_var($email,FILTER_SANITIZE_EMAIL); if (filter_var($clean_email,FILTER_VALIDATE_EMAIL)){ // email is valid and ready for use } else { // email is invalid and should be rejected }
PHP is open source, so these questions are easily answered by just using it.
Source for FILTER_SANITIZE_EMAIL:
/* {{{ php_filter_email */ #define SAFE "$-_.+" #define EXTRA "!*'()," #define NATIONAL "{}|\\^~[]`" #define PUNCTUATION "<>#%\"" #define RESERVED ";/?:@&=" void php_filter_email(PHP_INPUT_FILTER_PARAM_DECL) { /* Check section 6 of rfc 822 http://www.faqs.org/rfcs/rfc822.html */ const unsigned char allowed_list[] = LOWALPHA HIALPHA DIGIT "!#$%&'*+-=?^_`{|}~@.[]"; filter_map map; filter_map_init(&map); filter_map_update(&map, 1, allowed_list); filter_map_apply(value, &map); }
Source for FILTER_VALIDATE_EMAIL:
void php_filter_validate_email(PHP_INPUT_FILTER_PARAM_DECL) /* {{{ */ { const char regexp[] = "/^(?!(?:(?:\\x22?\\x5C[\\x00-\\x7E]\\x22?)|(?:\\x22?[^\\x5C\\x22]\\x22?)){255,})(?!(?:(?:\\x22?\\x5C[\\x00-\\x7E]\\x22?)|(?:\\x22?[^\\x5C\\x22]\\x22?)){65,}@)(?:(?:[\\x21\\x23-\\x27\\x2A\\x2B\\x2D\\x2F-\\x39\\x3D\\x3F\\x5E-\\x7E]+)|(?:\\x22(?:[\\x01-\\x08\\x0B\\x0C\\x0E-\\x1F\\x21\\x23-\\x5B\\x5D-\\x7F]|(?:\\x5C[\\x00-\\x7F]))*\\x22))(?:\\.(?:(?:[\\x21\\x23-\\x27\\x2A\\x2B\\x2D\\x2F-\\x39\\x3D\\x3F\\x5E-\\x7E]+)|(?:\\x22(?:[\\x01-\\x08\\x0B\\x0C\\x0E-\\x1F\\x21\\x23-\\x5B\\x5D-\\x7F]|(?:\\x5C[\\x00-\\x7F]))*\\x22)))*@(?:(?:(?!.*[^.]{64,})(?:(?:(?:xn--)?[a-z0-9]+(?:-+[a-z0-9]+)*\\.){1,126}){1,}(?:(?:[a-z][a-z0-9]*)|(?:(?:xn--)[a-z0-9]+))(?:-+[a-z0-9]+)*)|(?:\\[(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){7})|(?:(?!(?:.*[a-f0-9][:\\]]){7,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?)))|(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){5}:)|(?:(?!(?:.*[a-f0-9]:){5,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3}:)?)))?(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))(?:\\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))){3}))\\]))$/iD"; pcre *re = NULL; pcre_extra *pcre_extra = NULL; int preg_options = 0; int ovector[150]; /* Needs to be a multiple of 3 */ int matches; /* The maximum length of an e-mail address is 320 octets, per RFC 2821. */ if (Z_STRLEN_P(value) > 320) { RETURN_VALIDATION_FAILED } re = pcre_get_compiled_regex((char *)regexp, &pcre_extra, &preg_options TSRMLS_CC); if (!re) { RETURN_VALIDATION_FAILED } matches = pcre_exec(re, NULL, Z_STRVAL_P(value), Z_STRLEN_P(value), 0, 0, ovector, 3); /* 0 means that the vector is too small to hold all the captured substring offsets */ if (matches < 0) { RETURN_VALIDATION_FAILED } }
The "proper" way of doing this is asking for the user's email two times (which is common/good practice). But to answer your question, FILTER_SANITIZE_EMAIL
is not pointless. It's a filter that sanitizes emails and it does its job well.
You need to understand that a filter that validates either returns true
or false
whereas a filter that sanitizes actually modifies the given variable. The two do not serve the same purpose.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With