Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is UTF-8 acceptable for reading/writing Asian languages?

Tags:

c#

unicode

utf-8

I am accepting user input via a web form (as UTF-8), saving it to a MySQL DB (using UTF-8 character set) and generating a text file later (encoded as UTF-8). I am wondering if there is any chance of text corruption using UTF-8 instead of something like UCS-2? Is UTF-8 good enough in this situation?

like image 481
Jon Tackabury Avatar asked Aug 11 '09 17:08

Jon Tackabury


2 Answers

More than that, it is perhaps the only encoding you should ever consider using.

Some great reading on the subject:

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky

like image 67
karim79 Avatar answered Oct 06 '22 00:10

karim79


If you are working with a great deal of Asian text (more so than Latin text), you may want to consider UTF-16. UTF-8 can accurately represent the entire Unicode range of characters, but it is optimized for text that is mostly ASCII. UTF-16 is space-efficient over the entire Basic Multilingual Plane.

But UTF-8 is most certainly "good enough"—there will not be corruption arising simply because you are using UTF-8 over, say, UTF-16.

like image 20
John Calsbeek Avatar answered Oct 06 '22 00:10

John Calsbeek