Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to loadHTMLFile() when it fails with 'htmlParseEntityRef: no name' error?

Tags:

html

dom

php

xpath

I'm trying to get the string "hinson lou ann" out of:

 <div class='owner-name'>hinson lou ann</div>

When I run the following:

$html = "http://gisapps.co.union.nc.us/ws/rest/v2/cm_iw.ashx?gid=12339";
$doc  = new DOMDocument();
$doc->loadHTMLFile($html);
$xpath    = new DOMXpath($doc);
$elements = $xpath->query("*/div[@class='owner-name']");
if (!is_null($elements)) {
    foreach ($elements as $element) {
        echo "<br/>[" . $element->nodeName . "]";
        $nodes = $element->childNodes;
        foreach ($nodes as $node) {

            echo $node->nodeValue . "\n";
        }
    }
}

I get an error of:

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: htmlParseEntityRef: no name in http://gisapps.co.union.nc.us/ws/rest/v2/cm_iw.ashx?gid=12339, line: 1 in /home... on line ...

Which refers to the line of loadHTMLFILE.

Note: The file is not valid HTML it only contains div tags! What is I loaded the file and then slapped the HTMLbody tag on it?

like image 924
tyler Avatar asked Dec 04 '22 10:12

tyler


1 Answers

If you really must try to parse it, try this:

<?php
$html = file_get_contents("http://gisapps.co.union.nc.us/ws/rest/v2/cm_iw.ashx?gid=12339");
$doc = new DOMDocument();
$doc->strictErrorChecking = false;
$doc->recover=true;
@$doc->loadHTML("<html><body>".$html."</body></html>");

$xpath = new DOMXpath($doc);
$elements = $xpath->query("//*/div[@class='owner-name']");

if (!is_null($elements)) {
   foreach ($elements as $element) {
      echo "<br/>[". $element->nodeName. "]";
      $nodes = $element->childNodes;
      foreach ($nodes as $node) {
         echo $node->nodeValue. "\n";
     }
   }
 }
?>

PS: Your XPath was wrong, I fixed it. Your $nodes won't have anything because that DIV element (.owner-name) doesn't have any children.. so you'll need to revise that.

like image 112
Rob W Avatar answered Dec 06 '22 23:12

Rob W