Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: no name in Entity

I trying to get the "link" elements from certain webpages. I can't figure out what i'm doing wrong though. I'm getting the following error:

Severity: Warning

Message: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: no name in Entity, line: 536

Filename: controllers/test.php

Line Number: 34

Line 34 is the following in the code:

      $dom->loadHTML($html);

my code:

            $url = "http://www.amazon.com/";

    $ch = curl_init();

    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
    if($html = curl_exec($ch)){

        // parse the html into a DOMDocument
        $dom = new DOMDocument();

        $dom->recover = true;
        $dom->strictErrorChecking = false;

        $dom->loadHTML($html);

        $hrefs = $dom->getElementsByTagName('a');

        echo "<pre>";
        print_r($hrefs);
        echo "</pre>";

        curl_close($ch);


    }else{
        echo "The website could not be reached.";
    }
like image 504
David Avatar asked Sep 08 '12 05:09

David


2 Answers

This may be caused by a rogue & symbol that is immediately succeeded by a proper tag. As otherwise you would receive a missing ; error. See: Warning: DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity,.

The solution is to - replace the & symbol with &amp;
or if you must have that & as it is then, may be you could enclose it in: <![CDATA[ - ]]>

like image 96
Ujjwal Singh Avatar answered Oct 05 '22 11:10

Ujjwal Singh


It means some of the HTML code is invalid. THis is just a warning, not an error. Your script will still process it. To suppress the warnings set

 libxml_use_internal_errors(true);

Or you could just completely suppress the warning by doing

@$dom->loadHTML($html);
like image 29
Kris Avatar answered Oct 05 '22 09:10

Kris