Preserve utf8 when loading HTML from file

Question

Well, apparently, PHP and it's standard libraries have some problems, and DOMDocument isn't an exception.

There are workarounds for utf8 characters when loading HTML string - $dom->loadHTML().

Apparently, I haven't found a way to do this when loading HTML from file - $dom->loadHTMLFile(). While it reads and sets the encoding from <meta /> tags, the problem strikes back if I haven't defined those. For instance, when loading a fragment of HTML (template part, like, footer.html), not a fully built HTML document.

So, how do I preserve utf8 characters, when loading HTML from file, that hasn't got it's <meta /> keys present, and defining those is not an option?

Update

footer.html (the file is encoded in UTF-8 without BOM):

<div id="footer">
    <p>My sūpēr ōzōm ūtf8 štrīņģ</p>
</div>

index.php:

$dom = new DOMDocument;
$dom->loadHTMLFile('footer.html');
echo $dom->saveHTML(); // results in all familiar effed' up characters

Thanks in advance!

Sinthia V · Accepted Answer

Try a hack like this one:

$doc = new DOMDocument();
$doc->loadHTML('<?xml encoding="UTF-8">' . $html);
// dirty fix
foreach ($doc->childNodes as $item)
    if ($item->nodeType == XML_PI_NODE)
        $doc->removeChild($item); // remove hack
$doc->encoding = 'UTF-8'; // insert proper

Several others are listed in the user comments here: http://php.net/manual/en/domdocument.loadhtml.php. It is also important that your document head includea meta tag to specify encoding FIRST, directly after the tag.

Preserve utf8 when loading HTML from file

Tags:

php

encoding

utf-8

domdocument

Update

tomsseisums

1 Answers

Sinthia V

Recent Activity

Donate For Us

Preserve utf8 when loading HTML from file

Tags:

php

encoding

utf-8

domdocument

Update

tomsseisums

1 Answers

Sinthia V

Related questions

Recent Activity

Donate For Us