In my application, I alter some part of XML files, which begin like this:
<?xml version="1.0" encoding="UTF-8"?>
<!-- $Id: version control yadda-yadda $ -->
<myElement>
...
Note the blank line before <myElement>
. After loading, altering and saving, the result is far from pleasing:
<?xml version="1.0" encoding="UTF-8"?>
<!-- $Id: version control yadda-yadda $ --><myElement>
...
I found out that the whitespace (one newline) between the comment and the document node is not represented in the DOM at all. The following self-contained code reproduces the issue reliably:
String source =
"<?xml version=\"1.0\" encoding=\"UTF-16\"?>\n<!-- foo -->\n<empty/>";
byte[] sourceBytes = source.getBytes("UTF-16");
DocumentBuilder builder =
DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc =
builder.parse(new ByteInputStream(sourceBytes, sourceBytes.length));
DOMImplementationLS domImplementation =
(DOMImplementationLS) doc.getImplementation();
LSSerializer lsSerializer = domImplementation.createLSSerializer();
System.out.println(lsSerializer.writeToString(doc));
// output: <?xml version="1.0" encoding="UTF-16"?>\n<!-- foo --><empty/>
Does anyone have an idea how to avoid this? Essentially, I want the output to be the same as the input. (I know that the xml declaration will be regenerated because it's not part of the DOM, but that's not an issue here.)
I had the same problem. My solution was to write my own XML parser: DecentXML
Main feature: it can 100% preserve the original input, whitespace, entities, everything. It won't bother you with the details, but if your code needs to generate XML like this:
<element
attr="some complex value"
/>
then you can.
The root cause is that the standard DOM Level 3 cannot represent Text nodes as children of a Document without breaking the spec. Whitespace will be dropped by any compliant parser.
Document --
Element (maximum of one),
ProcessingInstruction,
Comment,
DocumentType (maximum of one)
If you require a standards-compliant solution and the objective is readability rather than 100% reproduction, I would look for it in your output mechanism.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With