Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP Simple HTML DOM Parser returning false on valid url

I'm trying the following:

$url = 'https://www.tripadvisor.es/Hotels-g187514-Madrid-Hotels.html'

$ta_html = file_get_html($url);
var_dump($ta_html);

it returns false, this is working and correctly getting the html for:

$url = 'https://www.tripadvisor.es/Hotels-g294316-Lima_Lima_Region-Hotels.html#ACCOM_OVERVIEW'

My first thought was that it had a redirect but I checked the headers with curl and its 200 ok and it seemed like the same on both cases. What can be happening? how it can be solved?

This seems to be a duplicate of this problem: Simple HTML DOM returning false that is also unanswered

like image 323
Aschab Avatar asked Apr 22 '17 17:04

Aschab


People also ask

What is simple HTML DOM parser PHP?

The web scraping can be done by targeting the selected DOM components and then processing or storing the text between that DOM element of a web page. To do the same in PHP, there is an API which parses the whole page and looks for the required elements within the DOM. It is the Simple HTML DOM Parser.

What is HTML DOM Parser?

The DOMParser interface provides the ability to parse XML or HTML source code from a string into a DOM Document .

What is DOM parser in PHP?

Dom parser travels based on tree based and before access the data, it will load the data into dom object and it will update the data to the web browser. Below Example shows how to get access to the HTML data in web browser.


1 Answers

It looks like HTML DOM parser is failing because the HTML file size is greater than the library's max file size. When you call file_get_html() it does a file size check based on it's MAX_FILE_SIZE constant. So before calling any HTML DOM parser methods, increase the max file size used by the library by calling:

define('MAX_FILE_SIZE', 1200000); // or larger if needed, default is 600000

Also as as you found out you can work around the file size check with doing this

$html = new simple_html_dom();
$html->load($str);
like image 105
Jim Avatar answered Nov 01 '22 03:11

Jim