Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

htmlentities() makes Chinese characters unusable

we have a web application where we allow users to enter their own html in a text area. We save that data to our database.

When we load the html data into the text area, of course, we use htmlentities() before throwing the html data into the textarea. Otherwise users could save inside the textarea and our application would break when loading that into the textarea.

this works great, except when entering Chinese characters (and probably other languages such as Arabic, Japanese).

The htmlentities() makes the chinese text unusable like this: �¨�³�¼�§ï When I remove the htmlentities() before loading the entered html into the text area, Chinese characters show up just fine, but then we have the problem of HTML interfering with our textarea, especially when a users enters inside the text area.

I hope that makes sense.

Does anyone know how we can safely and correctly allow languages such as Chinese, Japanese, ... to be used inside our text area, while still being safe for loading any html inside our text area?

like image 789
Jorre Avatar asked Jun 23 '11 10:06

Jorre


2 Answers

Have you tried using htmlspecialchars?

I currently use that in production and it's fine.

$foo = "我的名字叫萨沙"
echo '<textarea>' . htmlspecialchars($foo) . '</textarea>';

Alternately,

$str = “&#20320;&#22909;”;
echo mb_convert_encoding($str, ‘UTF-8′, ‘HTML-ENTITIES’);

As found on http://www.techiecorner.com/129/php-how-to-convert-iso-character-htmlentities-to-utf-8/

like image 109
sdolgy Avatar answered Oct 17 '22 00:10

sdolgy


Specify charset, e.g. UTF-8 and it should work.

echo htmlentities($data, ENT_COMPAT, 'UTF-8'); 
like image 41
Dan Avatar answered Oct 16 '22 23:10

Dan