I'm trying to parse throught http://whatismyip.com page and get my location (state and country). The data seems to be inside <table class="table">
tags, so i'm looking for "table".
But I get a mistake Warning: file_get_contents(https://whatismyip.com): failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden in C:\xampp4\htdocs\scraping\libs\simple_html_dom.php on line 1081
Can't figure out what's wrong.
<?php
require_once('libs/simple_html_dom.php');
$html=new simple_html_dom();
$html->load_file('https://whatismyip.com');
$element=$html->find("table");
?>
That website is checking the User-Agent
header of the request but PHP doesn't send any (by default). You'll have to "impersonate" a browser:
$context = stream_context_create(array(
'http' => array(
'header' => array('User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:2.2) Gecko/20110201'),
),
));
$html = file_get_contents('http://whatismyip.com/', false, $context);
// do what you want with the $html
A better (and faster) option would be to use some library for this. I've used GeoIP2-php before but I'm sure there are more.
basicly your exemple it good but the mistakes here is simple html dom classes not working with Https so try another method
$curl = curl_init();
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($curl, CURLOPT_HEADER, false);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_URL, "https://whatismyip.com");
curl_setopt($curl, CURLOPT_REFERER, "https://whatismyip.com");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:2.2) Gecko/20110201');
$str = curl_exec($curl);
curl_close($curl);
and then use your code
$html->load_file($str);
$element=$html->find("table");
Edit Adding User-agent to emulate a real navigator (thanks to ShiraNai7)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With