Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

DomDocument and special characters

This is my code:

$oDom = new DOMDocument(); $oDom->loadHTML("èàéìòù"); echo $oDom->saveHTML(); 

This is the output:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <html><body><p>&Atilde;&uml;&Atilde;&nbsp;&Atilde;&copy;&Atilde;&not;&Atilde;&sup2;&Atilde;&sup1;</p></body></html> 

I want this output:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <html><body><p>èàéìòù</p></body></html> 

I've tried with ...

$oDom = new DomDocument('4.0', 'UTF-8'); 

or with 1.0 and other stuffs but nothing.

Another thing ... There is a way to obtain the same untouched HTML? For example with this html in input <p>hello!</p> obtain the same output <p>hello!</p> using DOMDocument only for parsing the DOM and to do some substitutions inside the tags.

like image 457
Francesco Casula Avatar asked Jul 04 '11 15:07

Francesco Casula


2 Answers

Solution:

$oDom = new DOMDocument(); $oDom->encoding = 'utf-8'; $oDom->loadHTML( utf8_decode( $sString ) ); // important!  $sHtml = '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">'; $sHtml .= $oDom->saveHTML( $oDom->documentElement ); // important! 

The saveHTML() method works differently specifying a node. You can use the main node ($oDom->documentElement) adding the desired !DOCTYPE manually. Another important thing is utf8_decode(). All the attributes and the other methods of the DOMDocument class, in my case, don't produce the desired result.

like image 181
Francesco Casula Avatar answered Oct 13 '22 23:10

Francesco Casula


Try to set the encoding type after you have loaded the HTML.

$dom = new DOMDocument(); $dom->loadHTML($data); $dom->encoding = 'utf-8'; echo $dom->saveHTML(); 

Other way

like image 42
SAIF Avatar answered Oct 13 '22 22:10

SAIF