Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP DOMDocument is not rendering Unicode Characters Properly

I am using CKEditor for letting the user to post comments, user can also put the unicode characters in the comment box.

When I submit the Form and Check the $_POST["reply"], the unicode characters are shown very well. I have also used header('Content-type:text/html; charset=utf-8'); at the top of the page But When I process it using PHP DOMDocument, all the characters become unreadable.

$html_unicode = "xyz unicode data";
$html_data = '<body>'.$html_unicode . '</body>';
$dom = new DOMDocument();
$dom->loadHTML($html_data );

$elements = $dom->getElementsByTagName('body');

When I echo

echo $dom->textContent;

The Output becomes

§Ø³ÙبÙÙ ÙÙÚº غرÙب ک٠آÙÛ ÙÛÙ

How Can I get the proper unicode characters back using PHP DOMDocument.

like image 719
Munib Avatar asked Mar 29 '13 13:03

Munib


3 Answers

This worked for me:

$html_unicode = "xyz unicode data";
$html_data = '<body>'.$html_unicode . '</body>';

$dom = new DOMDocument();
$html_data  = mb_convert_encoding($html_data , 'HTML-ENTITIES', 'UTF-8'); // require mb_string
$dom->loadHTML($html_data);

$elements = $dom->getElementsByTagName('body');
like image 161
Andre Avatar answered Nov 15 '22 16:11

Andre


Try this :)

<?php
    $html_unicode = "xyz unicode data";
    $html_data = '<body>'.$html_unicode . '</body>';
    $dom = new DOMDocument();
    $dom->loadHTML($html_data );

    $elements = $dom->getElementsByTagName('body');
    echo utf8_decode($dom->textContent);
?>
like image 7
Rohit Subedi Avatar answered Nov 15 '22 15:11

Rohit Subedi


Thank God I got the Solution By Just Replacing

$html_data = '<body>'.$html_unicode . '</body>';

with

$html_data = '<head><meta http-equiv="Content-Type" 
content="text/html; charset=utf-8">
</head><body>' . $html_unicode . '</body>';
like image 1
Munib Avatar answered Nov 15 '22 16:11

Munib