Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why will this function using CURL work for some URLs but not others?

Tags:

php

curl

I'm writing a website in PHP that aggregates data from various other websites. I have a function 'returnPageSource' that takes a URL and returns the html from that URL as a string.

function returnPageSource($url){
    $ch = curl_init();
    $timeout = 5;   // set to zero for no timeout       

    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);     // means the page is returned
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOUT_CONNECTTIMEOUT, $timeout); // how long to wait to connect
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);     // follow redirects
    //curl_setopt($ch, CURLOPT_HEADER, False);          // only request body

    $fileContents = curl_exec($ch); // $fileContents contains the html source of the required website
    curl_close($ch);

    return $fileContents;
}

This works fine for some of the websites I need, like http://atensembl.arabidopsis.info/Arabidopsis_thaliana_TAIR/unisearch?species=Arabidopsis_thaliana_TAIR;idx=;q=At5g02310, but not for others, like http://www.bar.utoronto.ca/efp/cgi-bin/efpWeb.cgi?dataSource=Chemical&modeInput=Absolute&primaryGene=At5g02310&orthoListOn=0 . Does anybody have any idea why?

Update

Thanks for the responses. I've changed my useragent to be the same as my browser (Firefox 3, which can access the sites fine), changed timeout to 0 and I still can't connect, but I can get some error messages. curl_error() gives me the error "couldn't connect to host", and curl_getinfo($ch, CURLINFO_HTTP_CODE); returns HTTP code 0...neither of which is very helpful. I've also tried curl_setopt($ch, CURLOPT_VERBOSE, 1);, but that displayed nothing. Does anybody have any other ideas?

Final Update

I just realised I didn't explain what was wrong - I just needed to enter the proxy settings for my university (I'm using the university's server). Everything worked fine after that!

like image 224
Daniel Avatar asked Mar 01 '23 00:03

Daniel


1 Answers

You should use curl_error() to check which error has occurred (if any)

like image 159
Greg Avatar answered Mar 08 '23 22:03

Greg