Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP: cURL and keep track of all redirections

Tags:

php

curl

libcurl

I'm looking to cURL a URL and keep track of each individual URL it goes through. For some reason I am unable to accomplish this without doing recursive cURL calls which is not ideal. Perhaps I am missing some easy option. Thoughts?

 $url = "some url with redirects";
 $ch = curl_init($url);
 curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
 curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
 curl_setopt($ch, CURLOPT_HEADER, true);
 curl_setopt($ch, CURLOPT_NOBODY, false);
 curl_setopt($ch, CURLOPT_TIMEOUT, 10);
 curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
 curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1) Gecko/20061024 BonEcho/2.0");

 $html = curl_exec($ch);
 $info = array();
 if(!curl_errno($ch))
 {
      $info = curl_getinfo($ch);
      echo "<pre>";
      print_r($info);
      echo "</pre>";
 }

and I get a response like this

Array
(
    [url] => THE LAST URL THAT WAS HIT
    [content_type] => text/html; charset=utf-8
    [http_code] => 200
    [header_size] => 1942
    [request_size] => 1047
    [filetime] => -1
    [ssl_verify_result] => 0
    [redirect_count] => 2   <---- I WANT THESE
    [total_time] => 0.799589
    [namelookup_time] => 0.000741
    [connect_time] => 0.104206
    [pretransfer_time] => 0.104306
    [size_upload] => 0
    [size_download] => 49460
    [speed_download] => 61856
    [speed_upload] => 0
    [download_content_length] => 49460
    [upload_content_length] => 0
    [starttransfer_time] => 0.280781
    [redirect_time] => 0.400723
)
like image 527
Thomas Avatar asked Jul 13 '10 01:07

Thomas


People also ask

What is the cURL option to follow all redirects?

By default, Curl does not follow redirects and displays the content of the 300x page (if any). To follow redirects with Curl, you need to use the -L or --location command-line option.

How can I get redirected URL in PHP?

php $url="http://libero-news.it.feedsportal.com/c/34068/f/618095/s/2e34796f/l/0L0Sliberoquotidiano0Bit0Cnews0C12735670CI0Esaggi0Eper0Ele0Eriforme0Ecostituzionali0EChiaccherano0Ee0Eascoltano0Bhtml/story01.htm"; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_HEADER, true); curl_setopt($ ...

How does HTTP redirect work?

In HTTP, redirection is triggered by a server sending a special redirect response to a request. Redirect responses have status codes that start with 3 , and a Location header holding the URL to redirect to. When browsers receive a redirect, they immediately load the new URL provided in the Location header.


1 Answers

With libcurl, you can use the CURLINFO_REDIRECT_URL getinfo variable to find out the URL it would have redirected to if it was enabled. This allows programs to easily traverse the redirects themselves.

This approach is much better and easier than the parsing of Location: headers the others have suggested here, as then your code must rebuild relative paths etc. CURLINFO_REDIRECT_URL fixes that for you automatically.

The PHP/CURL binding added support for this feature in PHP 5.3.7:

$url = curl_getinfo($ch, CURLINFO_REDIRECT_URL)

The commit that fixed this:

https://github.com/php/php-src/commit/689268a0ba4259c8f199cae6343b3d17cab9b6a5

like image 192
Daniel Stenberg Avatar answered Oct 20 '22 23:10

Daniel Stenberg