Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP file_get_contents very slow when using full url

Tags:

php

I am working with a script (that I did not create originally) that generates a pdf file from an HTML page. The problem is that it is now taking a very long time, like 1-2 minutes, to process. Supposedly this was working fine originally, but has slowed down within the past couple of weeks.

The script calls file_get_contents on a php script, which then outputs the result into an HTML file on the server, and runs the pdf generator app on that file.

I seem to have narrowed down the problem to the file_get_contents call on a full url, rather than a local path.

When I use

$content = file_get_contents('test.txt'); 

it processes almost instantaneously. However, if I use the full url

$content = file_get_contents('http://example.com/test.txt'); 

it takes anywhere from 30-90 seconds to process.

It's not limited to our server, it is slow when accessing any external url, such as http://www.google.com. I believe the script calls the full url because there are query string variables that are necessary that don't work if you call the file locally.

I also tried fopen, readfile, and curl, and they were all similarly slow. Any ideas on where to look to fix this?

like image 461
ecurbh Avatar asked Sep 02 '10 17:09

ecurbh


People also ask

Which is faster cURL or file_get_contents?

This is old topic but on my last test on one my API, cURL is faster and more stable. Sometimes file_get_contents on larger request need over 5 seconds when cURL need only from 1.4 to 1.9 seconds what is double faster.

Does file_get_contents cache?

Short answer: No. file_get_contents is basically just a shortcut for fopen, fread, fclose etc - so I imagine opening a file pointer and freading it isn't cached.

What does file_get_contents PHP input do?

The command file_get_contents('php://input') reads the raw information sent to PHP -- unprocessed before it ever gets put into $_POST or $_REQUEST super globals. This technique is often used when someone is uploading a file, such as an image.


2 Answers

Note: This has been fixed in PHP 5.6.14. A Connection: close header will now automatically be sent even for HTTP/1.0 requests. See commit 4b1dff6.

I had a hard time figuring out the cause of the slowness of file_get_contents scripts.

By analyzing it with Wireshark, the issue (in my case and probably yours too) was that the remote web server DIDN'T CLOSE THE TCP CONNECTION UNTIL 15 SECONDS (i.e. "keep-alive").

Indeed, file_get_contents doesn't send a "connection" HTTP header, so the remote web server considers by default that's it's a keep-alive connection and doesn't close the TCP stream until 15 seconds (It might not be a standard value - depends on the server conf).

A normal browser would consider the page is fully loaded if the HTTP payload length reaches the length specified in the response Content-Length HTTP header. File_get_contents doesn't do this and that's a shame.

SOLUTION

SO, if you want to know the solution, here it is:

$context = stream_context_create(array('http' => array('header'=>'Connection: close\r\n'))); file_get_contents("http://www.something.com/somepage.html",false,$context); 

The thing is just to tell the remote web server to close the connection when the download is complete, as file_get_contents isn't intelligent enough to do it by itself using the response Content-Length HTTP header.

like image 84
KrisWebDev Avatar answered Sep 28 '22 10:09

KrisWebDev


I would use curl() to fetch external content, as this is much quicker than the file_get_contents method. Not sure if this will solve the issue, but worth a shot.

Also note that your servers speed will effect the time it takes to retrieve the file.

Here is an example of usage:

$ch = curl_init(); curl_setopt($ch, CURLOPT_URL, 'http://example.com/test.txt'); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); $output = curl_exec($ch); curl_close($ch); 
like image 41
Jim Avatar answered Sep 28 '22 09:09

Jim