Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Guides for dealing with Unicode in PHP5?

Hey everybody. I'm developing a new site (php5/mySQL) and am looking to finally get on the Unicode bandwagon. I'll admit to knowing next to absolutely nothing about supporting Unicode at the moment, but I'm hoping to resolve that with your help.

After desperately flexing my tiny, pathetic excuses for Googlefu-muscles, and scouring over each page that looked promising to my Unicode-newbie eyes, I have come to the conclusion that, while not entirely supported, my precious language of choice (PHP for those that have forgotten) has made at least a half-assed attempt at managing the foreign beast (and from what else I see, succeeding?). I have also come to the conclusion that

<php header('Content-Type: text/html; charset=utf-8'); ?>

is a great place to start and that I should be looking into supporting UTF-8 since I have plenty of space on my (shared, for the moment) hosting.

However, I'm not sure what this strange functionality known as mb_* means or how to incorporate it into functions such as strlen() and . . . to be honest at this point I don't know what other functionality (that I can't live without) is affected.

So I've come to you SO-ites in search of enlightenment and possibly straightening out my confused (where Unicode is concerned!) brain. I really want to support it but I need serious help.

P.S.: Does Unicode affect mysql_real_escape_string() or any other XSS prevention/security measures? I need to stay on top of this as well!

Thanks ahead of time.

  • Adding Javascript into the mix, since I'll be using a mix of pure and jQuery and no knowing about Unicode support + this language. ;)
like image 335
Zydeco Avatar asked Jan 18 '11 02:01

Zydeco


1 Answers

  1. Welcome onboard utf8 :)
  2. You should simply use mb_* functions in place of your traditional str* functions
  3. MySQL and its API has long and well been supporting utf8, the only requirement that you use encoding when saving data and connecting. google for 'SET NAMES utf8'
  4. Note the 'u' modifier for preg_* functions that tells them to use unicode mode.
like image 124
Dennis Kreminsky Avatar answered Sep 19 '22 21:09

Dennis Kreminsky