Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP get_headers() reports different headers than CURL

How is it possible that get_headers() could possibly return a different result than getting them by CURL? Here is my code:

header("Content-type: text/plain");
$url = 'http://www.foxbusiness.com/index.html';

echo "get_headers() headers:\n\n";
$headers = get_headers($url);
print_r($headers);

echo "\n\nCURL headers\n\n";
$curl = curl_init();
curl_setopt_array( $curl, array(
    CURLOPT_HEADER => true,
    CURLOPT_NOBODY => true,
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_URL => $url ) );
$headers = explode( "\n", curl_exec( $curl ) );
curl_close( $curl );
print_r($headers);

This is the result:

get_headers() headers:

Array
(
    [0] => HTTP/1.0 403 Forbidden
    [1] => Server: AkamaiGHost
    [2] => Mime-Version: 1.0
    [3] => Content-Type: text/html
    [4] => Content-Length: 283
    [5] => Expires: Fri, 31 Aug 2012 07:29:14 GMT
    [6] => Date: Fri, 31 Aug 2012 07:29:14 GMT
    [7] => Connection: close
)


CURL headers

Array
(
    [0] => HTTP/1.1 200 OK
    [1] => Server: Apache
    [2] => X-FoxNews-EdgeTTL: 2m
    [3] => Content-Type: text/html;charset=UTF-8
    [4] => Cache-Control: max-age=64
    [5] => Date: Fri, 31 Aug 2012 07:29:14 GMT
    [6] => Connection: keep-alive
    [7] => 
    [8] => 
)
like image 945
Mike Avatar asked Aug 31 '12 07:08

Mike


2 Answers

get_headers will do a GET request by default while you configured cURL to do a HEAD request. Start by making the request identical to what cURL sends by putting a different HTTP stream context using HEAD for the request method.

Also, the server seems to expect a User Agent, so make sure you either provide user_agent in php.ini or add it to the stream context.

The following should work:

stream_context_set_default(
    array(
        'http' => array(
            'method' => 'HEAD',
            'user_agent' => "PHP"
        )
    )
);

See http://codepad.viper-7.com/cOO9XS

Note that stream_context_set_default modifies the global default Stream Context, so any calls to other methods using this stream wrapper will now do HEAD requests once you called the above. Unlike for example, file_get_contents, get_headers does not allow supplying a custom stream context via arguments to the function. In other words, make sure you change the method back to GET after you got the headers.

like image 67
Gordon Avatar answered Sep 27 '22 19:09

Gordon


Add a different User-Agent header before get_headers:

stream_context_set_default(
    array(
        'http' => array(
            'method' => 'HEAD',
            'header' => "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.79 Safari/537.1\r\n"
        )
    )
);

And, might as well specify HEAD since you only want headers. With this change you get the right headers.

OUTPUT

get_headers() headers:

Array
(
    [0] => HTTP/1.0 200 OK
    [1] => Server: Apache
    [2] => X-FoxNews-EdgeTTL: 2m
    [3] => Content-Type: text/html;charset=UTF-8
    [4] => Cache-Control: max-age=76
    [5] => Date: Fri, 31 Aug 2012 07:53:24 GMT
    [6] => Connection: close
)


CURL headers

Array
(
    [0] => HTTP/1.1 200 OK
    [1] => Server: Apache
    [2] => X-FoxNews-EdgeTTL: 2m
    [3] => Content-Type: text/html;charset=UTF-8
    [4] => Cache-Control: max-age=76
    [5] => Date: Fri, 31 Aug 2012 07:53:24 GMT
    [6] => Connection: keep-alive
    [7] => 
    [8] => 
)
like image 32
sberry Avatar answered Sep 27 '22 20:09

sberry