Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the correct way to encode CR-LF line breaks in text/xml values?

As opposed to application/xml files which could do anything, or normalizedString values which convert all whitespace sequences to a single space character, I'm asking here specifically in the context of text/xml files with string values. For the sake of simplicity, let's say I'm only using ASCII characters with a UTF8 encoded file.

Given the following two-line text string I wish to represent in XML:

Hello
World!

Which is the following bytes in memory:

0000: 48 65 6c 6c 6f 0d 0a 57 6f 72 6c 64 21 Hello..World!

According to RFC 2046, any text/* MIME type MUST (not should) represent a line break using Carriage Return followed by Linefeed character sequence. In that light, the following XML fragment should be right:

<tag>Hello
World!</tag>

or

0000: 3c 74 61 67 3c 48 65 6c 6c 6f 0d 0a 57 6f 72 6c <tag>Hello..Worl
0010: 64 21 3c 2f 74 61 67 3c                         d!</tag>

But I regularly see files like the following:

<tag><![CDATA[Hello
World!]]></tag>

Or, even stranger:

<tag>Hello&xD;
World!</tag>

Where the &0xD; sequence is followed by a single Linefeed character:

0000: 3c 74 61 67 3c 48 65 6c 6c 6f 26 78 44 3b 0a 57 <tag>Hello&xD;.W
0010: 6f 72 6c 64 21 3c 2f 74 61 67 3c                orld!</tag>

What am I missing here? What's the correct way to represent multiple lines of text in an XML string value so that it can come out the other end unmolested?

like image 647
AlwaysLearning Avatar asked Feb 22 '13 02:02

AlwaysLearning


1 Answers

CR (&x0D;), LF (&x0A;), CRLF, or a few other combinations are all valid. As noted in the spec, all of these are translated to a single &x0A; character.

like image 68
Eric Galluzzo Avatar answered Oct 04 '22 18:10

Eric Galluzzo