I run the code first on MAMP and it worked very well. But when I tried to run the code on another server, I got a lot of warnings like:
Warning: DOMDocument::loadHTML(): Unexpected end tag : head in Entity, line: 3349 in /cgihome/zhang1/html/cgi-bin/getPrice.php on line 17 Warning: DOMDocument::loadHTML(): htmlParseStartTag: misplaced tag in Entity, line: 3350 in /cgihome/zhang1/html/cgi-bin/getPrice.php on line 17 Warning: DOMDocument::loadHTML(): Tag header invalid in Entity, line: 3517 in /cgihome/zhang1/html/cgi-bin/getPrice.php on line 17
The codes are following:
<?php
$amazon = file_get_contents('http://www.amazon.com/blablabla');
$doc = new DOMdocument();
$doc->loadHTML($amazon);
$doc->saveHTML();
$price = $doc -> getElementById('actualPriceValue')->textContent;
$ASIN = $doc -> getElementById('ASIN')->getAttribute('value');
?>
Anyone knows what's going on? Thanks!
To disable the warning, you can use
libxml_use_internal_errors(true);
This works for me, Manual, read on:
Background: You are loading invalid HTML. Invalid HTML is quite common, DOMDocument::loadHTML
corrects most of the problems, but gives warnings by default.
With libxml_use_internal_errors
you can control that behavior. Set it before loading the document:
$previously = libxml_use_internal_errors(true);
$doc->loadHTML($amazon);
Then after loading you can deal with the errors (if you want/need to):
/* @var LibXMLError[] $xmlErrors */
$xmlErrors = libxml_get_errors();
And finally clear them (as they will add up) and restore the previous setting if applicable:
unset($xmlErrors);
libxml_clear_errors();
libxml_use_internal_errors($previously);
References
libxml_use_internal_errors
Disable libxml errors and allow user to fetch error information as neededlibxml_clear_errors
Clear libxml error bufferlibxml_get_errors
Retrieve array of errorsLibXMLError
The libXMLError classThis problem is related to non xHTML code
As DOMdocument() can only process clean XHTML you need to clean up your code
Php have an extension that does the job pretty well. Called Tidy php.net/book.tidy
It might be tricky as you may need to enable it in your php.ini
Then
$tidy_config = array(
'clean' => true,
'output-xhtml' => true,
'show-body-only' => true,
'wrap' => 0,
);
$tidy = tidy_parse_string( $html, $tidy_config, 'UTF8');
$tidy->cleanRepair();
$doc = new DOMdocument();
$doc->loadHTML( (string) $tidy);
You can surpress the warning like this:
@$doc->loadHTML($amazon);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With