Well, apparently, PHP and it's standard libraries have some problems, and DOMDocument isn't an exception.
There are workarounds for utf8
characters when loading HTML string - $dom->loadHTML()
.
Apparently, I haven't found a way to do this when loading HTML from file - $dom->loadHTMLFile()
. While it reads and sets the encoding from <meta />
tags, the problem strikes back if I haven't defined those. For instance, when loading a fragment of HTML (template part, like, footer.html
), not a fully built HTML document.
So, how do I preserve utf8 characters, when loading HTML from file, that hasn't got it's <meta />
keys present, and defining those is not an option?
footer.html (the file is encoded in UTF-8 without BOM):
<div id="footer">
<p>My sūpēr ōzōm ūtf8 štrīņģ</p>
</div>
index.php:
$dom = new DOMDocument;
$dom->loadHTMLFile('footer.html');
echo $dom->saveHTML(); // results in all familiar effed' up characters
Thanks in advance!
Try a hack like this one:
$doc = new DOMDocument();
$doc->loadHTML('<?xml encoding="UTF-8">' . $html);
// dirty fix
foreach ($doc->childNodes as $item)
if ($item->nodeType == XML_PI_NODE)
$doc->removeChild($item); // remove hack
$doc->encoding = 'UTF-8'; // insert proper
Several others are listed in the user comments here: http://php.net/manual/en/domdocument.loadhtml.php. It is also important that your document head includea meta tag to specify encoding FIRST, directly after the tag.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With