Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it utf-8 suitable for text/plain mime type?

I am exporting data through files. The output is base64 encoded data.

$data = base64_encode(serialize($data));

Which results in something like:

bGFzcyI6MTp7czo1OiJzZXR1cCI7YTo3Mzp7czoyNToicGFnZXNfY29udGFjdF91c19oZWFkbGlu

So I am wondering what charset is more suitable for this data (plain text). us-ascii seems enough but utf-8 always seems an error-proof default.

header('content-type: text/plain; charset=utf-8');
like image 718
Igor Parra Avatar asked Mar 05 '12 19:03

Igor Parra


People also ask

What is the MIME type of a plain text?

Two primary MIME types are important for the role of default types: text/plain is the default value for textual files. A textual file should be human-readable and must not contain binary data. application/octet-stream is the default value for all other cases.

Is plain text UTF-8?

Plain text, the media type text/plain, is a format that contains readable characters, without separate structure or formatting data. The characters can in principle be displayed in any encoding. Traditionally this was mainly ASCII, nowadays mostly UTF-8 and possibly UTF-16 are used.

What characters are not allowed in UTF-8?

Yes. 0xC0, 0xC1, 0xF5, 0xF6, 0xF7, 0xF8, 0xF9, 0xFA, 0xFB, 0xFC, 0xFD, 0xFE, 0xFF are invalid UTF-8 code units.


1 Answers

You won't actually even need a charset. 'text/plain' may be incorrect though, because it's also not really text.

Even though it is compatible with ascii, utf-8, latin1 (as ruakh mentioned), you should just treat it as a binary file.

Update

I wanted to clarify this a bit (after all the downvotes, common guys give me a chance!)

@dan04: UTF-8 is text, I didn't say it wasn't. Base64 is not, base64 is also an encoding, but it can encode any binary sequence. Base64 is encoded in such a way that it possible to wrap it in US-ASCII (and therefore also UTF-8 and latin1 / ISO-8859).

Base64 is still just a binary sequence though, and not per definition text. The fact that the same range of octet-values are used as US-ASCII (and 'printable' by anything that reads US-ASCII) does not make it text.

This is also why Base64 does not have it's own mimetype. It's considered a content-transfer encoding. (look it up!)

So the actual correct way to serve Base64 it with the mimetype of what the string contains, along with a Content-Transfer-Encoding header. For example, if you're encoding a jpeg, this is the correct format.

Content-Type: image/jpeg
Content-Transfer-Encoding: base64 

This is also why I feel that if you don't want to say anything about the contents of the string (or don't have this information), it's best to treat it as 'generic binary', e.g.:

Content-Type: application/octet-stream
Content-Transfer-Encoding: base64 
like image 175
Evert Avatar answered Sep 28 '22 15:09

Evert