In the days of link shorteners and Ajax, there can be many links that ultimately point to the same content. I was wondering what the best way is to get the final, best link for a web site in PHP, hopefully with a library. I was unable to find anything on Google or GitHub.
I have seen this example code, but it doesn't handle things like a rel="canonical" meta tags or default ssl ports: http://w-shadow.com/blog/2008/07/05/how-to-get-redirect-url-in-php/
Facebook seems to handle this pretty well, you can see how they follow 301's and rel="canonical", etc. To see examples of the way Facebook handles it, use their Open Graph tool:
https://developers.facebook.com/tools/debug
and enter these links:
http://dlvr.it/xxb0W
https://twitter.com/#!/twitter/statuses/136946408275193856
Is there a PHP library out there that already has this pre-built, where it will check for these headers, resolve 301 redirects, parse rel="canonical", detect redirect loops and properly just grab the best resulting URL to use?
As an alternative, I am open to APIs that can be used, but would prefer something that runs on my own server.
Since I wasn't able to find any libraries that really did what I was looking for, and I was hoping to do more than just follow HTTP redirects, I have gone ahead and created a library that accomplishes the goals and released it under the MIT license. You can get it here:
https://github.com/mattwright/URLResolver.php
URLResolver.php is a PHP class that attempts to resolve URLs to a final, canonical link:
I am certainly not an expert on the rules of HTTP redirection, so if anyone has suggestions on how to improve this library, it would be greatly appreciated. I have tested in on thousands of URLs and it seems to do pretty well. I followed Mario's advice and used PHP Simple HTML Parser library where needed.
Using Guzzle (a well known and robust HTTP client) you can do it like that:
<?php
use Guzzle\Http\Client as GuzzleClient;
use Guzzle\Plugin\History\HistoryPlugin;
public function resolveUrl($url)
{
$client = new GuzzleClient($url);
$history = new HistoryPlugin();
$client->addSubscriber($history);
$response = $client->head($url)->send();
if (!$response->isSuccessful()) {
throw new \Exception(sprintf("Url %s is not a valid URL or website is down.", $url));
}
return $response->getEffectiveUrl();
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With