Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

simple html dom scraping large html file

I need to scrape a large html file (eg: http://www.indianrail.gov.in/mail_express_trn_list.html) using simple html dom. I started with a simple script:

<?php
require "simple_html_dom.php";
echo file_get_html('http://www.indianrail.gov.in/mail_express_trn_list.html')->plaintext;
?>

which shows nothing, just a blank page with the error message in Apache error.log file

 PHP Notice:  Trying to get property of non-object in /var/www/index.php on line 3
 PHP Notice:  Trying to get property of non-object in /var/www/index.php on line 3

at the same time all other pages (eg: http://www.indianrail.gov.in/special_trn_list.html) works fine with the same script.

like image 245
krizna Avatar asked Jul 30 '13 05:07

krizna


1 Answers

The issue appears to be MAX_FILE_SIZE defined in simple_html_dom.

you can adjust it by editing define('MAX_FILE_SIZE', 600000); line in simple_html_dom.php file.

like image 78
DevZer0 Avatar answered Nov 09 '22 18:11

DevZer0