As opposed to application/xml files which could do anything, or normalizedString values which convert all whitespace sequences to a single space character, I'm asking here specifically in the context of text/xml files with string values. For the sake of simplicity, let's say I'm only using ASCII characters with a UTF8 encoded file.
Given the following two-line text string I wish to represent in XML:
Hello
World!
Which is the following bytes in memory:
0000: 48 65 6c 6c 6f 0d 0a 57 6f 72 6c 64 21 Hello..World!
According to RFC 2046, any text/* MIME type MUST (not should) represent a line break using Carriage Return followed by Linefeed character sequence. In that light, the following XML fragment should be right:
<tag>Hello
World!</tag>
or
0000: 3c 74 61 67 3c 48 65 6c 6c 6f 0d 0a 57 6f 72 6c <tag>Hello..Worl
0010: 64 21 3c 2f 74 61 67 3c d!</tag>
But I regularly see files like the following:
<tag><![CDATA[Hello
World!]]></tag>
Or, even stranger:
<tag>Hello&xD;
World!</tag>
Where the &0xD; sequence is followed by a single Linefeed character:
0000: 3c 74 61 67 3c 48 65 6c 6c 6f 26 78 44 3b 0a 57 <tag>Hello&xD;.W
0010: 6f 72 6c 64 21 3c 2f 74 61 67 3c orld!</tag>
What am I missing here? What's the correct way to represent multiple lines of text in an XML string value so that it can come out the other end unmolested?
CR (&x0D;), LF (&x0A;), CRLF, or a few other combinations are all valid. As noted in the spec, all of these are translated to a single &x0A; character.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With