I'm writing a website in PHP that aggregates data from various other websites. I have a function 'returnPageSource' that takes a URL and returns the html from that URL as a string.
function returnPageSource($url){
$ch = curl_init();
$timeout = 5; // set to zero for no timeout
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // means the page is returned
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOUT_CONNECTTIMEOUT, $timeout); // how long to wait to connect
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); // follow redirects
//curl_setopt($ch, CURLOPT_HEADER, False); // only request body
$fileContents = curl_exec($ch); // $fileContents contains the html source of the required website
curl_close($ch);
return $fileContents;
}
This works fine for some of the websites I need, like http://atensembl.arabidopsis.info/Arabidopsis_thaliana_TAIR/unisearch?species=Arabidopsis_thaliana_TAIR;idx=;q=At5g02310, but not for others, like http://www.bar.utoronto.ca/efp/cgi-bin/efpWeb.cgi?dataSource=Chemical&modeInput=Absolute&primaryGene=At5g02310&orthoListOn=0 . Does anybody have any idea why?
Update
Thanks for the responses. I've changed my useragent to be the same as my browser (Firefox 3, which can access the sites fine), changed timeout to 0 and I still can't connect, but I can get some error messages. curl_error() gives me the error "couldn't connect to host", and curl_getinfo($ch, CURLINFO_HTTP_CODE); returns HTTP code 0...neither of which is very helpful. I've also tried curl_setopt($ch, CURLOPT_VERBOSE, 1);, but that displayed nothing. Does anybody have any other ideas?
Final Update
I just realised I didn't explain what was wrong - I just needed to enter the proxy settings for my university (I'm using the university's server). Everything worked fine after that!
You should use curl_error()
to check which error has occurred (if any)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With