Is UTF-8 acceptable for reading/writing Asian languages?

Question

I am accepting user input via a web form (as UTF-8), saving it to a MySQL DB (using UTF-8 character set) and generating a text file later (encoded as UTF-8). I am wondering if there is any chance of text corruption using UTF-8 instead of something like UCS-2? Is UTF-8 good enough in this situation?

karim79 · Accepted Answer

More than that, it is perhaps the only encoding you should ever consider using.

Some great reading on the subject:

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky

John Calsbeek · Answer

If you are working with a great deal of Asian text (more so than Latin text), you may want to consider UTF-16. UTF-8 can accurately represent the entire Unicode range of characters, but it is optimized for text that is mostly ASCII. UTF-16 is space-efficient over the entire Basic Multilingual Plane.

But UTF-8 is most certainly "good enough"—there will not be corruption arising simply because you are using UTF-8 over, say, UTF-16.

Is UTF-8 acceptable for reading/writing Asian languages?

Tags:

c#

unicode

utf-8

Jon Tackabury

2 Answers

karim79

John Calsbeek

Recent Activity

Donate For Us

Is UTF-8 acceptable for reading/writing Asian languages?

Tags:

c#

unicode

utf-8

Jon Tackabury

2 Answers

karim79

John Calsbeek

Related questions

Recent Activity

Donate For Us