This is my code:
$oDom = new DOMDocument(); $oDom->loadHTML("èàéìòù"); echo $oDom->saveHTML();
This is the output:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <html><body><p>èà éìòù</p></body></html>
I want this output:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <html><body><p>èàéìòù</p></body></html>
I've tried with ...
$oDom = new DomDocument('4.0', 'UTF-8');
or with 1.0 and other stuffs but nothing.
Another thing ... There is a way to obtain the same untouched HTML? For example with this html in input <p>hello!</p>
obtain the same output <p>hello!</p>
using DOMDocument only for parsing the DOM and to do some substitutions inside the tags.
Solution:
$oDom = new DOMDocument(); $oDom->encoding = 'utf-8'; $oDom->loadHTML( utf8_decode( $sString ) ); // important! $sHtml = '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">'; $sHtml .= $oDom->saveHTML( $oDom->documentElement ); // important!
The saveHTML()
method works differently specifying a node. You can use the main node ($oDom->documentElement
) adding the desired !DOCTYPE
manually. Another important thing is utf8_decode()
. All the attributes and the other methods of the DOMDocument
class, in my case, don't produce the desired result.
Try to set the encoding type after you have loaded the HTML.
$dom = new DOMDocument(); $dom->loadHTML($data); $dom->encoding = 'utf-8'; echo $dom->saveHTML();
Other way
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With