Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP file_get_contents() behaves differently to browser

I'm trying to download the contents of a web page using PHP. When I issue the command:

$f = file_get_contents("http://mobile.mybustracker.co.uk/mobile.php?searchMode=2");

It returns a page that reports that the server is down. Yet when I paste the same URL into my browser I get the expected page.

Does anyone have any idea what's causing this? Does file_get_contents transmit any headers that differentiate it from a browser request?

like image 601
DaveG Avatar asked Mar 30 '10 20:03

DaveG


People also ask

What does file_get_contents do in PHP?

The file_get_contents() reads a file into a string. This function is the preferred way to read the contents of a file into a string.

Which is faster cURL or file_get_contents?

curl supports HTTPS certificates, HTTP POST, HTTP PUT, FTP uploading HTTP form based upload, proxies, cookies. Curl is a much faster alternative to file_get_contents.

Is file_get_contents slow?

it takes anywhere from 30-90 seconds to process. It's not limited to our server, it is slow when accessing any external url, such as http://www.google.com. I believe the script calls the full url because there are query string variables that are necessary that don't work if you call the file locally.


1 Answers

Yes, there are differences -- the browser tends to send plenty of additionnal HTTP headers, I'd say ; and the ones that are sent by both probably don't have the same value.

Here, after doing a couple of tests, it seems that passing the HTTP header called Accept is necessary.

This can be done using the third parameter of file_get_contents, to specify additionnal context informations :

$opts = array('http' =>
    array(
        'method'  => 'GET',
        //'user_agent '  => "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2) Gecko/20100301 Ubuntu/9.10 (karmic) Firefox/3.6",
        'header' => array(
            'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*\/*;q=0.8
'
        ), 
    )
);
$context  = stream_context_create($opts);

$f = file_get_contents("http://mobile.mybustracker.co.uk/mobile.php?searchMode=2", false, $context);
echo $f;

With this, I'm able to get the HTML code of the page.


Notes :

  • I first tested passing the User-Agent, but it doesn't seem to be necessary -- which is why the corresponding line is here as a comment
  • The value is used for the Accept header is the one Firefox used when I requested that page with Firefox before trying with file_get_contents.
    • Some other values might be OK, but I didn't do any test to determine which value is the required one.


For more informations, you can take a look at :

  • file_get_contents
  • stream_context_create
  • Context options and parameters
  • HTTP context options -- that's the interesting page, here ;-)
like image 64
Pascal MARTIN Avatar answered Oct 26 '22 02:10

Pascal MARTIN