Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP: Delay parsing of page source (via file_get_html()) by 1 second

Tags:

php

curl

I am using PHP to try and scrape a page that seems to dynamically load content just milliseconds after the parent page finishes loading.

I am using curl to parse the page, and simpleHtmlDom to snatch things from the parsed html.

My efforts to traverse the DOM and explode() things out of the html return nothing. My only ideas were that it was loading the content after the parent page was loaded.

Here is my code.

<? 
 $url = 'http://www.facebook.com/OneAndroidAppaDay';
 $scrapeUrl = 'http://www.facebook.com/OneAndroidAppaDay';

  include_once('simple_html_dom.php');
  require_once("bitly.php");

  $userAgent = 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)';
  $ch = curl_init();
  curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
  curl_setopt($ch, CURLOPT_URL,$scrapeUrl);
  curl_setopt($ch, CURLOPT_FAILONERROR, true);
  curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
  curl_setopt($ch, CURLOPT_AUTOREFERER, true);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
  curl_setopt($ch, CURLOPT_TIMEOUT, 10);
  $html = curl_exec($ch);
  if (!$html) {
   echo "<br />cURL error number:" .curl_errno($ch);
   echo "<br />cURL error:" . curl_error($ch);
   exit;
  }

  $appBitlyUrl = $html->find('div[class=UIStoryAttachment_Title]',0)->find('a',0)->href; // fail :(
  echo 'Bitly Url:  ' . $appBitlyUrl;
?>

It's bombing out at line 24 (denoted with the inline comment) with this error:

Fatal error: Call to a member function find() on a non-object in /home/xxxxxxxx/public_html/xxx.xx/xxxx.php on line 24

Is there a way to make it wait a second or two before it snatches the page's html? Or maybe someone has some better insight?

Thanks

Mark

like image 201
marky-b Avatar asked Nov 20 '25 06:11

marky-b


1 Answers

to do a simple delay

sleep(2); // 2 second delay before continuing
like image 71
Patrick Avatar answered Nov 21 '25 20:11

Patrick



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!