Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

file_get_contents is not working for some url

I use file_get_contents in PHP. In the below code in first URL works fine but the second one isn't working.


$URL = "http://test6473.blogspot.com";
$domain = file_get_contents($URL);
print_r($domain);


$add_url= "http://adfoc.us/1575051";
$add_domain = file_get_contents($add_url);
echo $add_domain;

Any suggestions on why the second one doesn't work?

like image 428
Parixit Avatar asked Jun 28 '13 11:06

Parixit


4 Answers

URL which is not retrieved by file_get_contents, because their server checks whether the request come from browser or any script. If they found request from script they simply disable page contents.

So that I have to make a request similar as browser request. So I have used following code to get 2nd url contents. It might be different for different web server. Because they might keep different checks.

Even though why dont you try to use following code! If you are lucky this might work for you!!

function getUrlContent($url) {
    fopen("cookies.txt", "w");
    $parts = parse_url($url);
    $host = $parts['host'];
    $ch = curl_init();
    $header = array('GET /1575051 HTTP/1.1',
        "Host: {$host}",
        'Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
        'Accept-Language:en-US,en;q=0.8',
        'Cache-Control:max-age=0',
        'Connection:keep-alive',
        'Host:adfoc.us',
        'User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36',
    );

    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 0);
    curl_setopt($ch, CURLOPT_COOKIESESSION, true);

    curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookies.txt');
    curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookies.txt');
    curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
    $result = curl_exec($ch);
    curl_close($ch);
    return $result;
}

$url = "http://adfoc.us/1575051";
$html = getUrlContent($url);

Thanks everyone for the guidance.

like image 178
Parixit Avatar answered Sep 20 '22 07:09

Parixit


Unfortunately it looks like the second site blocks access from unrecognized browsers. Even using curl from the command line doesn't work:

curl -I http://adfoc.us/1575051

gives:

HTTP/1.1 200 OK
Server: cloudflare-nginx
Date: Fri, 28 Jun 2013 12:15:40 GMT
Content-Type: text/html
Connection: keep-alive
X-Powered-By: PHP/5.5.0
Set-Cookie: __cfduid=d7cd1bf18c136a288cc2b36065a3b31f01372421740; expires=Mon, 23-Dec-2019 23:50:00 GMT; path=/; domain=.adfoc.us
CF-RAY: 85a4dc6829e06d0

but no content. Note it returns status 200 so if you check the returned string for boolean === false to see if it failed, it will actually appear as if it has worked.

If you need to spoof the useragent (and possibly other things) to try and get the url to accept your request, you'll need to take the plunge with the curl libraries and try different combinations to try and get it working. Experimenting to see what works with the curl command line first would also be a good way to reduce development time in investigating this.

Here's someone who has been through this before:

php curl: how can i emulate a get request exactly like a web browser?

like image 30
fquinner Avatar answered Sep 20 '22 07:09

fquinner


looks like the second url answers too slow sometimes, maybe have redirects. try to use curl and set bigger timeout. also, turn errors on

error_reporting(-1);
ini_set('display_errors','On');
like image 37
13DaGGeR Avatar answered Sep 23 '22 07:09

13DaGGeR


you can try this code also

<?php

function getUrlContent($url) {
    $parts = parse_url($url);
    $host = $parts['host'];
    $ch = curl_init();
    $header = array('GET /1575051 HTTP/1.1',
        "Host: {$host}",
        'Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
        'Accept-Language:en-US,en;q=0.8',
        'Cache-Control:max-age=0',
        'Connection:keep-alive',
        'Host:adfoc.us',
        'User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36',
    );

    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 0);
    curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
    $result = curl_exec($ch);
    curl_close($ch);
    return $result;
}

$url = "https://news.google.com/rss/search?q=apple&hl=en-IN&gl=IN&ceid=IN:en";
$html = getUrlContent($url);

$xml = simplexml_load_string($html);
$json = json_encode($xml);
$array = json_decode($json,TRUE);


print_r($array);
?>
like image 41
Deepak dev Avatar answered Sep 21 '22 07:09

Deepak dev