Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are some common character encodings that a text editor should support?

I have a text editor that can load ASCII and Unicode files. It automatically detects the encoding by looking for the BOM at the beginning of the file and/or searching the first 256 bytes for characters > 0x7f.

What other encodings should be supported, and what characteristics would make that encoding easy to auto-detect?

like image 582
Nathan Osman Avatar asked Dec 29 '22 04:12

Nathan Osman


2 Answers

Definitely UTF-8. See http://www.joelonsoftware.com/articles/Unicode.html.

As far as I know, there's no guaranteed way to detect this automatically (although the probability of a mistaken diagnosis can be reduced to a very small amount by scanning).

like image 85
Steve Emmerson Avatar answered Jan 04 '23 15:01

Steve Emmerson


I don't know about encodings, but make sure it can support the multiple different line ending standards! (\n vs \r\n)

If you haven't checked out Mich Kaplan's blog yet, I suggest doing so: http://blogs.msdn.com/michkap/

Specifically this article may be useful: http://www.siao2.com/2007/04/22/2239345.aspx

like image 22
mletterle Avatar answered Jan 04 '23 16:01

mletterle