How to read text file without knowing the encoding

Question

When reading a text file that was created somewhere else outside my app, the encoding used is unknown. My app has being using NSUnicodeStringEncoding (which is the same as NSUTF16StringEncoding) so have problems reading other than UTF16 encoded files.

Is there a way I can guess the encoding of a file? My priority is to be able to read UTF8 files and then all other files. Is iterating through available encodings and check if read string's length is more than zero is really a good approach?

Thanks in advance.

Ignacio

Ole Begemann · Accepted Answer

Apple's documentation has some guidance on how to proceed: String Programming Guide: Reading data with an unknown encoding:

If you are forced to guess the encoding (and note that in the absence of explicit information, it is a guess):

Try stringWithContentsOfFile:usedEncoding:error: or initWithContentsOfFile:usedEncoding:error: (or the URL-based equivalents). These methods try to determine the encoding of the resource, and if successful return by reference the encoding used.

If (1) fails, try to read the resource by specifying UTF-8 as the encoding.

If (2) fails, try an appropriate legacy encoding. "Appropriate" here depends a bit on circumstances; it might be the default C string encoding, it might be ISO or Windows Latin 1, or something else, depending on where your data is coming from.

Steve Wellens · Answer

If the file is properly constructed you can read the first four bytes and see if it is a BOM (Byte Order Mark):

http://en.wikipedia.org/wiki/Byte-order_mark

How to read text file without knowing the encoding

Tags:

text

encoding

iphone

nsstring

nacho4d

2 Answers

Ole Begemann

Steve Wellens

Recent Activity

Donate For Us

How to read text file without knowing the encoding

Tags:

text

encoding

iphone

nsstring

nacho4d

2 Answers

Ole Begemann

Steve Wellens

Related questions

Recent Activity

Donate For Us