I am exporting data through files. The output is base64 encoded data.
$data = base64_encode(serialize($data));
Which results in something like:
bGFzcyI6MTp7czo1OiJzZXR1cCI7YTo3Mzp7czoyNToicGFnZXNfY29udGFjdF91c19oZWFkbGlu
So I am wondering what charset is more suitable for this data (plain text). us-ascii seems enough but utf-8 always seems an error-proof default.
header('content-type: text/plain; charset=utf-8');
Two primary MIME types are important for the role of default types: text/plain is the default value for textual files. A textual file should be human-readable and must not contain binary data. application/octet-stream is the default value for all other cases.
Plain text, the media type text/plain, is a format that contains readable characters, without separate structure or formatting data. The characters can in principle be displayed in any encoding. Traditionally this was mainly ASCII, nowadays mostly UTF-8 and possibly UTF-16 are used.
Yes. 0xC0, 0xC1, 0xF5, 0xF6, 0xF7, 0xF8, 0xF9, 0xFA, 0xFB, 0xFC, 0xFD, 0xFE, 0xFF are invalid UTF-8 code units.
You won't actually even need a charset. 'text/plain' may be incorrect though, because it's also not really text.
Even though it is compatible with ascii, utf-8, latin1 (as ruakh mentioned), you should just treat it as a binary file.
Update
I wanted to clarify this a bit (after all the downvotes, common guys give me a chance!)
@dan04: UTF-8 is text, I didn't say it wasn't. Base64 is not, base64 is also an encoding, but it can encode any binary sequence. Base64 is encoded in such a way that it possible to wrap it in US-ASCII (and therefore also UTF-8 and latin1 / ISO-8859).
Base64 is still just a binary sequence though, and not per definition text. The fact that the same range of octet-values are used as US-ASCII (and 'printable' by anything that reads US-ASCII) does not make it text.
This is also why Base64 does not have it's own mimetype. It's considered a content-transfer encoding. (look it up!)
So the actual correct way to serve Base64 it with the mimetype of what the string contains, along with a Content-Transfer-Encoding header. For example, if you're encoding a jpeg, this is the correct format.
Content-Type: image/jpeg
Content-Transfer-Encoding: base64
This is also why I feel that if you don't want to say anything about the contents of the string (or don't have this information), it's best to treat it as 'generic binary', e.g.:
Content-Type: application/octet-stream
Content-Transfer-Encoding: base64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With