Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Disable html entity encoding in PHP DOMDocument

Tags:

dom

php

I cannot figure out how to stop DOMDocument from mangling these characters.

<?php

$doc = new DOMDocument();
$doc->substituteEntities = false;
$doc->loadHTML('<p>¯\(°_o)/¯</p>');
print_r($doc->saveHTML());

?>

Expected Output: ¯(°_o)/¯

Actual Output: ¯(°_o)/¯

http://codepad.org/W83eHSsT

like image 404
anonymous Avatar asked Aug 20 '11 23:08

anonymous


2 Answers

I've found a hint in the comments of DOMDocument::loadHTML documentation:

(Comment from <mdmitry at gmail dot com> 21-Dec-2009 05:02: "You can also load HTML as UTF-8 using this simple hack:")

Just add '<?xml encoding="UTF-8">' before the HTML-input:

$doc = new DOMDocument();
//$doc->substituteEntities = false;
$doc->loadHTML('<?xml encoding="UTF-8">' . '<p>¯\(°_o)/¯</p>');
print_r($doc->saveHTML());
like image 138
feeela Avatar answered Oct 16 '22 16:10

feeela


<?xml version="1.0" encoding="utf-8">

in the top of the document takes care of tags.. for both saveXML and saveHTML.

like image 40
love2code94 Avatar answered Oct 16 '22 18:10

love2code94