Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What should be the default encoding for an API which reads from an URL using the file: protocol?

I'm designing an API which takes an URL as an input, and reads the content at that URL. When the URL is a "file:" protocol, what would make a better default for the character encoding?

  • the system's native encoding
  • UTF-8

The API allows this to be set explicitly. Also, there are a few heuristics we can use to determine the character encoding, like the BOM if available, but when all of these fail, what should be the default?

As far as I can tell, the standards are silent on this issue. All else being equal, I want the right thing to happen most often for someone who doesn't even know there is such a thing as character encoding.

like image 681
Matthew Simoneau Avatar asked Nov 05 '22 06:11

Matthew Simoneau


1 Answers

Always use UTF-8 if possible, and document this in your API documentation. UTF-8 is a rock solid standard for encoding and very future proof - I would avoid generating potential work for yourself by supporting other encodings - also UTF-8 will be easy to use if you migrate the API to be used in such a way that it can be accessed via a Web Service.

like image 169
Dave Kerr Avatar answered Nov 10 '22 16:11

Dave Kerr