I cannot figure out how to stop DOMDocument from mangling these characters.
<?php
$doc = new DOMDocument();
$doc->substituteEntities = false;
$doc->loadHTML('<p>¯\(°_o)/¯</p>');
print_r($doc->saveHTML());
?>
Expected Output: ¯(°_o)/¯
Actual Output: ¯(°_o)/¯
http://codepad.org/W83eHSsT
I've found a hint in the comments of DOMDocument::loadHTML documentation:
(Comment from <mdmitry at gmail dot com> 21-Dec-2009 05:02: "You can also load HTML as UTF-8 using this simple hack:")
Just add '<?xml encoding="UTF-8">'
before the HTML-input:
$doc = new DOMDocument();
//$doc->substituteEntities = false;
$doc->loadHTML('<?xml encoding="UTF-8">' . '<p>¯\(°_o)/¯</p>');
print_r($doc->saveHTML());
<?xml version="1.0" encoding="utf-8">
in the top of the document takes care of tags.. for both saveXML and saveHTML.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With