Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use CDATA to store raw binary streams?

Instead of the overhead with saving binary as Base64, I was wondering if you could directly store double-byte binary streams into XML files, using CDATA, or commenting it out, or something?

like image 817
Robin Rodricks Avatar asked Feb 02 '09 11:02

Robin Rodricks


2 Answers

The Nul character ( '\0' in C ) is not valid anywhere in XML, even as an escape ( & #0; ).

like image 64
Pete Kirkham Avatar answered Oct 14 '22 07:10

Pete Kirkham


No you can't use CDATA alone to inject binary data in an XML file.

In XML1.0 (because XML 1.1 is more permissive, but not about control chars), the following restrictions apply to CDATA characters:

CData      ::=      (Char* - (Char* ']]>' Char*)) 
Char       ::=      #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

That means there are several characters illegal, among them are:

  • illegal XML control characters 0x00 to 0x20 except new lines, carriage returns and tabs
  • illegal UTF-8 sequences like 0xFF or the non canonical 0b1100000x 0b10xxxxxx

In addition to that, in a standard entity content without CDATA :

  • "<" and ">" use are illegal
  • "&" use is restricted (&eacute; is OK, &zajdalkdza; is not)

So CDATA is just a way to allow "<", ">" and "&", by restricting "]]>" instead. It doesn't solve the illegal XML, Unicode and UTF-8 characters issue which is the main problem.

Solutions:

  1. Use Base64 with 33% overhead but a large support in all programming languages and the fact that it's a standard
  2. Use BaseXML with still limited implementations but 20% overhead only
  3. Don't encode binary data within XML if possible, transfer it separately
like image 20
KrisWebDev Avatar answered Oct 14 '22 06:10

KrisWebDev