I'm designing an API which takes an URL as an input, and reads the content at that URL. When the URL is a "file:" protocol, what would make a better default for the character encoding?
The API allows this to be set explicitly. Also, there are a few heuristics we can use to determine the character encoding, like the BOM if available, but when all of these fail, what should be the default?
As far as I can tell, the standards are silent on this issue. All else being equal, I want the right thing to happen most often for someone who doesn't even know there is such a thing as character encoding.
Always use UTF-8 if possible, and document this in your API documentation. UTF-8 is a rock solid standard for encoding and very future proof - I would avoid generating potential work for yourself by supporting other encodings - also UTF-8 will be easy to use if you migrate the API to be used in such a way that it can be accessed via a Web Service.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With