I am accepting user input via a web form (as UTF-8), saving it to a MySQL DB (using UTF-8 character set) and generating a text file later (encoded as UTF-8). I am wondering if there is any chance of text corruption using UTF-8 instead of something like UCS-2? Is UTF-8 good enough in this situation?
More than that, it is perhaps the only encoding you should ever consider using.
Some great reading on the subject:
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky
If you are working with a great deal of Asian text (more so than Latin text), you may want to consider UTF-16. UTF-8 can accurately represent the entire Unicode range of characters, but it is optimized for text that is mostly ASCII. UTF-16 is space-efficient over the entire Basic Multilingual Plane.
But UTF-8 is most certainly "good enough"—there will not be corruption arising simply because you are using UTF-8 over, say, UTF-16.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With