Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Warning: DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity,

Tags:

php

$html = file_get_contents("http://www.somesite.com/");

$dom = new DOMDocument();
$dom->loadHTML($html);

echo $dom;

throws

Warning: DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity,
Catchable fatal error: Object of class DOMDocument could not be converted to string in test.php on line 10
like image 665
gweg Avatar asked Nov 06 '09 03:11

gweg


7 Answers

To evaporate the warning, you can use libxml_use_internal_errors(true)

// create new DOMDocument
$document = new \DOMDocument('1.0', 'UTF-8');

// set error level
$internalErrors = libxml_use_internal_errors(true);

// load HTML
$document->loadHTML($html);

// Restore error level
libxml_use_internal_errors($internalErrors);
like image 141
Dewsworld Avatar answered Nov 18 '22 17:11

Dewsworld


I would bet that if you looked at the source of http://www.somesite.com/ you would find special characters that haven't been converted to HTML. Maybe something like this:

<a href="/script.php?foo=bar&hello=world">link</a>

Should be

<a href="/script.php?foo=bar&amp;hello=world">link</a>
like image 33
mattalxndr Avatar answered Nov 18 '22 16:11

mattalxndr


$dom->@loadHTML($html);

This is incorrect, use this instead:

@$dom->loadHTML($html);
like image 57
Maanas Royy Avatar answered Nov 18 '22 16:11

Maanas Royy


There are 2 errors: the second is because $dom is no string but an object and thus cannot be "echoed". The first error is a warning from loadHTML, caused by invalid syntax of the html document to load (probably an & (ampersand) used as parameter separator and not masked as entity with &).

You ignore and supress this error message (not the error, just the message!) by calling the function with the error control operator "@" (http://www.php.net/manual/en/language.operators.errorcontrol.php )

@$dom->loadHTML($html);
like image 16
user279583 Avatar answered Nov 18 '22 17:11

user279583


The reason for your fatal error is DOMDocument does not have a __toString() method and thus can not be echo'ed.

You're probably looking for

echo $dom->saveHTML();
like image 12
Mike B Avatar answered Nov 18 '22 18:11

Mike B


Regardless of the echo (which would need to be replaced with print_r or var_dump), if an exception is thrown the object should stay empty:

DOMNodeList Object
(
)

Solution

  1. Set recover to true, and strictErrorChecking to false

    $content = file_get_contents($url);
    
    $doc = new DOMDocument();
    $doc->recover = true;
    $doc->strictErrorChecking = false;
    $doc->loadHTML($content);
    
  2. Use php's entity-encoding on the markup's contents, which is a most common error source.

like image 11
Lorenz Lo Sauer Avatar answered Nov 18 '22 16:11

Lorenz Lo Sauer


replace the simple

$dom->loadHTML($html);

with the more robust ...

libxml_use_internal_errors(true);

if (!$DOM->loadHTML($page))
    {
        $errors="";
        foreach (libxml_get_errors() as $error)  {
            $errors.=$error->message."<br/>";
        }
        libxml_clear_errors();
        print "libxml errors:<br>$errors";
        return;
    }
like image 10
David Chan Avatar answered Nov 18 '22 16:11

David Chan