Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

LINQ to XML ignores line breaks in attributes

According to this question:

Are line breaks in XML attribute values allowed?

line breaks in XML attributes are perfectly valid (although perhaps not recommended):

<xmltag1>
    <xmltag2 attrib="line 1
line 2
line 3">
    </xmltag2>
</xmltag1>

When I parse such XML using LINQ to XML (System.Xml.Linq), those line breaks are converted silently to space ' ' characters.

Is there any way to tell the XDocument.Load() parser to preserve those line breaks?

P.S.: The XML I'm parsing is written by third-party software, so I cannot change the way the line breaks are written.

like image 621
cheesus Avatar asked Jul 13 '12 08:07

cheesus


2 Answers

If you want line breaks in attribute values to be preserved then you need to write them with character references e.g.

<foo bar="Line 1.&#10;Line 2.&#10;Line3."/>

as other wise the XML parser will normalize them to spaces, according to the XML specification http://www.w3.org/TR/xml/#AVNormalize.

[edit] If you want to avoid the attribute value normalization then loading the XML with a legacy XmlTextReader helps:

            string testXml = @"<foo bar=""Line 1.
Line 2.
Line 3.""/>";

            XDocument test;
            using (XmlTextReader xtr = new XmlTextReader(new StringReader(testXml)))
            {
                xtr.Normalization = false;
                test = XDocument.Load(xtr);
            }
            Console.WriteLine("|{0}|", test.Root.Attribute("bar").Value);

That outputs

|Line 1.
Line 2.
Line 3.|
like image 99
Martin Honnen Avatar answered Oct 06 '22 14:10

Martin Honnen


According to MSDN:

Although XML processors preserve all white space in element content, they frequently normalize it in attribute values. Tabs, carriage returns, and spaces are reported as single spaces. In certain types of attributes, they trim white space that comes before or after the main body of the value and reduce white space within the value to single spaces. (If a DTD is available, this trimming will be performed on all attributes that are not of type CDATA.)

For example, an XML document might contain the following:

<whiteSpaceLoss note1="this is a note." note2="this
is
a
note.">

An XML parser reports both attribute values as "this is a note.", converting the line breaks to single spaces.

I can't find anything about preserving whitespaces of attributes, but I guess it may be impossible according to this explanation.

like image 30
mmdemirbas Avatar answered Oct 06 '22 13:10

mmdemirbas