Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scraping with xpath, giving an error

Tags:

dom

php

xpath

I am trying to get the text from a page scrape using xpath, now I keep getting an error returned and no idea why! - bare in mind I am a very new php user, this is for a university project that I've taken on and its prooving to be very challenging :P but hey it should be.

Heres the code,

<?php

$html = file_get_contents('http://www.amazon.co.uk/New-Apple-iPod-touch-Generation/dp/B0040GIZTI/ref=br_lf_m_1000333483_1_1_img?ie=UTF8&s=electronics&pf_rd_p=229345967&pf_rd_s=center-3&pf_rd_t=1401&pf_rd_i=1000333483&pf_rd_m=A3P5ROKL5A1OLE&pf_rd_r=1ZW9HJW2KN2C2MTRJH60');

$dom = new DOMDocument();
$dom->loadHTML($html);

$xpath = new DOMXpath($dom);

$in_stock = $xpath->query("/html/body/div[@id='divsinglecolumnminwidth']/form[@id='handleBuy']/table[3]/tbody/tr[3]/td/div/span");



?>

I get the following error...

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Unexpected end tag : head in Entity, line: 2664 in C:\xampp\htdocs\scraping\domxpath.php on line 19

About a hundred times!

Any help really appreciated! , it must be really easy to fix :P

like image 626
Wade Avatar asked Dec 18 '25 13:12

Wade


2 Answers

Just put this line first in your code to stop displaying errors, this is particularly helpful when your document is an HTML page and if you don't know if it is a well formed XML doc .

libxml_use_internal_errors(true);

https://www.php.net/manual/fr/function.libxml-use-internal-errors.php

like image 168
mravey Avatar answered Dec 21 '25 04:12

mravey


$xpath = new DOMXpath($dom);

$expr = "/html/body/div[@id='divsinglecolumnminwidth']/form[@id='handleBuy']/table[3]/tr[3]/td/div/span";
$nodes = $xpath->query($expr); // returns DOMNodeList object
// you can check length property i.e. $nodes->length
echo $nodes->item(0)->nodeValue; // get first DOMNode object and its value

Also you need to add stametent for suppressing errors. I think that for performance reasons it's better to use absolute XPath expression, but relative //form[@id='handleBuy']/table[3]/tr[3]/td/div/span works as well and is more elastic.

like image 35
Grzegorz Szpetkowski Avatar answered Dec 21 '25 02:12

Grzegorz Szpetkowski



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!