Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Charset encoding problem

Tags:

php

mysql

I am developing an Arabic web site. However, I use AJAX to save some text in my data base. The AJAX works fine with me. My problem is, when I save the data in my database and try to print it on my screen, it returns a weird text. I have used the PHP function mb_detect_encoding to determine how the database deals with the text. The function returned UTF-8. So I used iconv("windows-1256","UTF-8",$row["text"]) to print the text on the screen, but it still returning this weird thing. Please give a hand Thanks

like image 587
Hassan Avatar asked Nov 25 '10 13:11

Hassan


1 Answers

please take a look at this thread (and use the search before posting a question first).

in your case, i think you've forgotten to set the chorrect charset for you database-connection (using a SET NAMES statement or mysql_set_charset()) - but thats hard to say.

this is a quote from chazomaticus, who has given a perfect answer in the liked thread, listing all the points you have to care of:

Storage:

  • Specify utf8_unicode_ci (or equivalent) collation on all tables and text columns in your database. This makes MySQL physically store and retrieve values natively in UTF-8.

Retrieval:

  • In PHP, in whatever DB wrapper you use, you'll need to set the connection charset to utf8. This way, MySQL does no conversion from its native UTF-8 when it hands data off to PHP. * Note that if you don't use a DB wrapper, you'll probably have to issue a query to tell MySQL to give you results in UTF-8: SET NAMES 'utf8' (as soon as you connect).

Delivery:

  • You've got to tell PHP to deliver the proper headers to the client, so text will be interpreted as UTF-8. In PHP, you can use the default_charset php.ini option, or manually issue the Content-Type header yourself, which is just more work but has the same effect.

Submission:

  • You want all data sent to you by browsers to be in UTF-8. Unfortunately, the only way to reliably do this is add the accept-charset attribute to all your <form> tags: <form ... accept-charset="UTF-8">.
  • Note that the W3C HTML spec says that clients "should" default to sending forms back to the server in whatever charset the server served, but this is apparently only a recommendation, hence the need for being explicit on every single <form> tag.
  • Although, on that front, you'll still want to verify every submitted string as being valid UTF-8 before you try to store it or use it anywhere. PHP's mb_check_encoding() does the trick, but you have to use it religiously.

Processing:

  • This is, unfortunately, the hard part. You need to make sure that every time you process a UTF-8 string, you do so safely. Easiest way to do this is by making extensive use of PHP's mbstring extension.
  • PHP's string operations are NOT by default UTF-8 safe. There are some things you can safely do with normal PHP string operations (like concatenation), but for most things you should use the equivalent mbstring function.
  • To know what you're doing (read: not mess it up), you really need to know UTF-8 and how it works on the lowest possible level. Check out any of the links from utf8.com for some good resources to learn everything you need to know.
  • Also, I feel like this should be said somewhere, even though it may seem obvious: every PHP or HTML file you'll be serving should be encoded in valid UTF-8.

note that you don't need to use utf-8 - the important part is to use the same charset everywhere, independent of what charset that might be. but if you need to change things anyway, use utf-8.

like image 124
oezi Avatar answered Sep 28 '22 06:09

oezi