Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Download the contents of a URL in PHP even if it returns a 404

I want to download the contents of a URL using PHP, even if the HTTP response code is 404. file_get_contents will error out, and I wasn't able to find an answer using Google. How can I do this?

like image 652
Jake Petroules Avatar asked Jul 16 '11 16:07

Jake Petroules


People also ask

What does HTTP status 404 not found?

404 is a status code that tells a web user that a requested page is not available. 404 and other response status codes are part of the web's Hypertext Transfer Protocol response codes. The 404 code means that a server could not find a client-requested webpage.

What is the reason for 404 error?

404 error or 'page not found' error is a Hypertext Transfer Protocol standard response code that indicates the server was unable to find what was requested. This message may also appear when the server is not willing to disclose the requested information or when the content has been deleted.


3 Answers

You have to configure the stream wrapper to ignore errors:

ignore_errors boolean Fetch the content even on failure status codes. Defaults to FALSE

In other words, do

echo file_get_contents(
    'http://stackoverflow.com/foo/bar',
    false,
    stream_context_create([
        'http' => [
            'ignore_errors' => true,
        ],
    ])
);

and you will get the 404 page.

If you want this to be the default behavior for HTTP streams, use

stream_context_set_default(
    array('http' => array(
        'ignore_errors' => true)
    )
);

Any calls using the HTTP stream wrapper will use these settings then, e.g. you can simply do

echo file_get_contents('http://stackoverflow.com/foo/bar');

If you also want to get the response header, just do

print_r($http_response_header);

after the call. The variable is (re-)populated after each call with a http stream wrapper.

like image 119
Gordon Avatar answered Oct 19 '22 23:10

Gordon


By default file_get_contents only returns the content of HTTP 200 responses.

With curl you get the headers and the content separately.

As of PHP 5.0, you can also specify a context for file_get_contents, allowing you to do it without relying on url (See Gordon's answer).

like image 36
Paul Avatar answered Oct 19 '22 23:10

Paul


Use cURL instead. It allows much greater control, and will let you read any content retrieved and the status code.

like image 35
sagi Avatar answered Oct 19 '22 22:10

sagi