Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Validating URL with and without protocol with filter_var

Tags:

validation

php

I am attempting to validate using PHP's filter_var() extension. Per http://php.net/manual/en/filter.filters.validate.php:

Validates value as URL (according to » http://www.faqs.org/rfcs/rfc2396), optionally with required components. Beware a valid URL may not specify the HTTP protocol http:// so further validation may be required to determine the URL uses an expected protocol, e.g. ssh:// or mailto:. Note that the function will only find ASCII URLs to be valid; internationalized domain names (containing non-ASCII characters) will fail.

In regards to, Beware a valid URL may not specify the HTTP protocol, my tests below indicate that a HTTP protocol is required (URL 'stackoverflow.com/' is NOT considered valid.). How am I misinterpreting the documentation?

Also, how are URLs such as https://https://stackoverflow.com/ prevented from validating true?

PS. Any comments on my approach of sanitizing the protocol would be appreciated.

<?php
function filterURL($url) {
    echo("URL '{$url}' is ".(filter_var($url, FILTER_VALIDATE_URL)?'':' NOT ').'considered valid.<br>');
}
function sanitizeURL($url) {
    return (strtolower(substr($url,0,7))=='http://' || strtolower(substr($url,0,8))=='https://')?$url:'http://'.$url;
}

filterURL('http://stackoverflow.com/');
filterURL('https://stackoverflow.com/');
filterURL('//stackoverflow.com/');
filterURL('stackoverflow.com/');
filterURL(sanitizeURL('http://stackoverflow.com/'));
filterURL(sanitizeURL('https://stackoverflow.com/'));
filterURL(sanitizeURL('stackoverflow.com/'));

filterURL('https://https://stackoverflow.com/');
?>

OUTPUT:

URL 'http://stackoverflow.com/' is considered valid.
URL 'https://stackoverflow.com/' is considered valid.
URL '//stackoverflow.com/' is NOT considered valid.
URL 'stackoverflow.com/' is NOT considered valid.
URL 'http://stackoverflow.com/' is considered valid.
URL 'https://stackoverflow.com/' is considered valid.
URL 'http://stackoverflow.com/' is considered valid.
URL 'https://https://stackoverflow.com/' is considered valid.
like image 259
user1032531 Avatar asked Jun 01 '15 13:06

user1032531


1 Answers

FILTER_VALIDATE_URL uses parse_url(), which unfortunatelly parses 'https://https://' as a valid URL (as it is really a valid one considering URIs RFC):

var_dump(parse_url('https://https://stackoverflow.com/'));

array(3) { 
  ["scheme"]=> string(5) "https" 
  ["host"]=> string(5) "https"
  ["path"]=> string(20) "//stackoverflow.com/" 
}

You could change your sanitazeURL function into:

function sanitizeURL($url) {
  return (parse_url($url, PHP_URL_SCHEME)) ? $url : 'http://' . $url;
}

but still you have to check whether host name is not http nor https:

function filterURL($url) {
  echo("URL '{$url}' is ".((filter_var($url, FILTER_VALIDATE_URL) !== false && (parse_url($url, PHP_URL_HOST) !== 'http' && parse_url($url, PHP_URL_HOST) !== 'https'))?'':' NOT ').'considered valid.<br>');
}
like image 132
Tomasz Racia Avatar answered Sep 28 '22 00:09

Tomasz Racia