It's annoying to see even the most professional sites do it wrong. Posted text turns into something that's unreadable. I don't have much information about encodings. I just want to know about the problem that's making such a basic thing so hard.
- Does HTTP encoding limit some
characters?
- Do users need to send info about the
charset/encoding they are using?
- Assuming everything arrives to the
server as it is, is encoding used
saving that text causing the problem?
- Is it something about browser
implementations?
- Do we need some JavaScript tricks to
make it work?
Is there an absolute solution to this? It may have its limits but StackOverflow seems to make it work.
I suspect one needs to make sure that the whole stack handles the encoding with care:
- Specify a web page font (CSS) that supports a wide range of international characters.
- Specify a correct lang/charset HTML tag attributes and make sure that the Browser is using the correct encoding.
- Make sure the HTTP requests are send with the appropriate charset specified in the headers.
- Make sure the content of the HTTP requests is decoded properly in your web request handler
- Configure your database/datastore with a internationalization-friendly encoding/Collation (such as UTF-9/UTF-16) and not one that just supports latin characters (default in some DBs).
The first few are normally handled by the browser and web framework of choice, but if you screw up the DB encoding or use a font with limited character set there will be no one to save you.