Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Writing xml with at most two tags per line

I am saving xml from .NET's XElement. I've been using the method ToString, but the formatting doesn't look how I'd like (examples below). I'd like at most two tags per line. How can I achieve that?


Saving XElement.Parse("<a><b><c>one</c><c>two</c></b><b>three<c>four</c><c>five</c></b></a>").ToString() gives me

<a>
  <b>
    <c>one</c>
    <c>two</c>
  </b>
  <b>three<c>four</c><c>five</c></b>
</a>

But for readability I would rather 'three', 'four' and 'five' were on separate lines:

<a>
  <b>
    <c>one</c>
    <c>two</c>
  </b>
  <b>three
    <c>four</c>
    <c>five</c>
  </b>
</a>

Edit: Yes I understand this is syntactically different and "not in the spirit of xml", but I'm being pragmatic. Recently I've seen megabyte-size xml files with as few as 3 lines—these are challenging to text editors, source control, and diff tools. Something needs to be done! I've tested that changing the formatting above is compatible with our application.

like image 540
Colonel Panic Avatar asked Oct 16 '12 10:10

Colonel Panic


1 Answers

If you want exactly that output, you'll need to do it manually, adding whitespace around nodes as necessary.

Almost all whitespace in XML documents is significant, even if we only think of it as indenting. When we ask the serializer to indent the document for us, it is making changes to the content that can get extracted, so they try to be as conservative as possible. The elements

<tag>foo</tag>

and

<tag>
    foo
</tag>

have different content, and if an serializer changed the former into the latter, it would change what you get back from your XML API when asking for the contents of <tag>.

The usual rule of thumb is that no indenting will be applied if there's any existing non-whitespace between the elements. In this case, your three between the tags would be modified if a serializer applied the indenting you desire, so nothing will do it for you automatically.


If you have control over the XML format, it's inadvisable to mix element and text children like this, where <b> has both text (three) and element (<c>) children, as it causes issues like what you're seeing.

like image 183
Jason Viers Avatar answered Oct 25 '22 01:10

Jason Viers