Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

DOMDocument::loadHTML(): warning - htmlParseEntityRef: no name in Entity

I have found several similar questions, but so far, none have been able to help me.

I am trying to output the 'src' of all images in a block of HTML, so I'm using DOMDocument(). This method is actully working, but I'm getting a warning on some pages, and I can't figure out why. Some posts suggested surpressing the warning, but I'd much rather find out why the warning is being generated.

Warning: DOMDocument::loadHTML(): htmlParseEntityRef: no name in Entity, line: 10

One example of post->post_content that is generating the error is -

On Wednesday 21st November specialist rights of way solicitor Jonathan Cheal of Dyne Drewett will be speaking at the Annual Briefing for Rural Practice Surveyors and Agricultural Valuers in Petersfield.
<br>
Jonathan is one of many speakers during the day and he is specifically addressing issues of public rights of way and village greens.
<br>
Other speakers include:-
<br>
<ul>
<li>James Atrrill, Chairman of the Agricultural Valuers Associates of Hants, Wilts and Dorset;</li>
<li>Martin Lowry, Chairman of the RICS Countryside Policies Panel;</li>
<li>Angus Burnett, Director at Martin & Company;</li>
<li>Esther Smith, Partner at Thomas Eggar;</li>
<li>Jeremy Barrell, Barrell Tree Consultancy;</li>
<li>Robin Satow, Chairman of the RICS Surrey Local Association;</li>
<li>James Cooper, Stnsted Oark Foundation;</li>
<li>Fenella Collins, Head of Planning at the CLA; and</li>
<li>Tom Bodley, Partner at Batcheller Monkhouse</li>
</ul>

I can post some more examples of what post->post_content contains if that would be helpful?

I have allowed access to a development site temporarily, so you can see some examples [Note - links no longer accessable as question has been answered] -

  • Error - http://test.dynedrewett.com/specialist-solicitor-speaks-at-petersfield-update/
  • No error - http://test.dynedrewett.com/restrictive-covenants-in-employment-contracts/

Any tips on how to resolve this? Thanks.

$dom = new DOMDocument();
$dom->loadHTML(apply_filters('the_content', $post->post_content)); // Have tried stripping all tags but <img>, still generates warning
$nodes = $dom->getElementsByTagName('img');
foreach($nodes as $img) :
    $images[] = $img->getAttribute('src');
endforeach;
like image 812
David Gard Avatar asked Feb 01 '13 14:02

David Gard


4 Answers

This correct answer comes from a comment from @lonesomeday.

My best guess then is that there is an unescaped ampersand (&) somewhere in the HTML. This will make the parser think we're in an entity reference (e.g. ©). When it gets to ;, it thinks the entity is over. It then realises what it has doesn't conform to an entity, so it sends out a warning and returns the content as plain text.

like image 54
David Gard Avatar answered Oct 10 '22 03:10

David Gard


As mentionned here

Warning: DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity,

you can use :

libxml_use_internal_errors(true);

see http://php.net/manual/en/function.libxml-use-internal-errors.php

like image 40
Ka. Avatar answered Oct 10 '22 04:10

Ka.


Check "&" character in your HTML code anywhere.I had that issue because of that scenario.

like image 2
Dhana Avatar answered Oct 10 '22 05:10

Dhana


I don't have the reputation required to leave a comment above, but using htmlspecialchars solved this problem in my case:

$inputHTML = htmlspecialchars($post->post_content);
$dom = new DOMDocument();
$dom->loadHTML(apply_filters('the_content', $inputHTML)); // Have tried stripping all tags but <img>, still generates warning
$nodes = $dom->getElementsByTagName('img');
foreach($nodes as $img) :
    $images[] = $img->getAttribute('src');
endforeach;

For my purposes, I'm also using strip_tags($inputHTML, "<strong><em><br>"), so all image tags are stripped out as well - I'm not sure if this would be a problem otherwise.

like image 1
Good Idea Avatar answered Oct 10 '22 03:10

Good Idea