Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I apply URL normalization rules in PHP?

Is there a pre-existing function or class for URL normalization in PHP?

Specifically, following the semantic preserving normalization rules laid out in this wikipedia article on URL normalization, (or whatever 'standard' I should be following).

  • Converting the scheme and host to lower case
  • Capitalizing letters in escape sequences
  • Adding trailing / (to directories, not files)
  • Removing the default port
  • Removing dot-segments

Right now, I'm thinking that I'll just use parse_url(), and apply the rules individually, but I'd prefer to avoid reinventing the wheel.

like image 384
Yahel Avatar asked Nov 14 '10 01:11

Yahel


People also ask

What does normalize URL do?

URL normalization (also called URL canonicalization) is the process by which URLs are modified and standardized in a consistent manner. The purpose of URL normalization is to transform a URL into a normalized or canonical URL so it is possible to determine if two syntactically different URLs are equivalent.

What is normalize incoming URLs?

URL normalization modifies separators, encoded elements, and literal bytes in incoming URLs so that they conform to a consistent formatting standard. For example, consider a firewall rule that blocks requests whose URLs match www.example.com/hello .


1 Answers

The Pear Net_URL2 library looks like it'll do at least part of what you want. It'll remove dot segments, fix capitalization and get rid of the default port:

include("Net/URL2.php");
$url = new Net_URL2('HTTP://example.com:80/a/../b/c');
print $url->getNormalizedURL();

emits:

http://example.com/b/c

I doubt there's a general purpose mechanism for adding trailing slashes to directories because you need a way to map urls to directories which is challenging to do in a generic way. But it's close.

References:

  • http://pear.php.net/package/Net_URL2
  • http://pear.php.net/package/Net_URL2/docs/latest/Net_URL2/Net_URL2.html
like image 127
Bharat Mediratta Avatar answered Oct 20 '22 21:10

Bharat Mediratta