Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

simple html dom failed to open stream for a site

I'm trying to parse throught http://whatismyip.com page and get my location (state and country). The data seems to be inside <table class="table"> tags, so i'm looking for "table". But I get a mistake Warning: file_get_contents(https://whatismyip.com): failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden in C:\xampp4\htdocs\scraping\libs\simple_html_dom.php on line 1081

Can't figure out what's wrong.

 <?php
        require_once('libs/simple_html_dom.php');
        $html=new simple_html_dom();

        $html->load_file('https://whatismyip.com');

        $element=$html->find("table");


    ?>
like image 942
parsecer Avatar asked Dec 03 '22 14:12

parsecer


2 Answers

That website is checking the User-Agent header of the request but PHP doesn't send any (by default). You'll have to "impersonate" a browser:

$context = stream_context_create(array(
    'http' => array(
        'header' => array('User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:2.2) Gecko/20110201'),
    ),
));

$html = file_get_contents('http://whatismyip.com/', false, $context);

// do what you want with the $html

A better (and faster) option would be to use some library for this. I've used GeoIP2-php before but I'm sure there are more.

like image 180
ShiraNai7 Avatar answered Dec 20 '22 07:12

ShiraNai7


basicly your exemple it good but the mistakes here is simple html dom classes not working with Https so try another method

$curl = curl_init();
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($curl, CURLOPT_HEADER, false);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_URL, "https://whatismyip.com");
curl_setopt($curl, CURLOPT_REFERER, "https://whatismyip.com");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:2.2) Gecko/20110201');
$str = curl_exec($curl);
curl_close($curl);

and then use your code

    $html->load_file($str);
    $element=$html->find("table");

Edit Adding User-agent to emulate a real navigator (thanks to ShiraNai7)

like image 38
Abderrahim Soubai-Elidrisi Avatar answered Dec 20 '22 07:12

Abderrahim Soubai-Elidrisi