Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to prevent XML Transformer from changing line endings

Tags:

java

dom

xml

I have a method that edits an xml file. the general outline of the method is:

public void process(Path anXmlFile) {
    try {
        anXmlFile= anXmlFile.normalize();
        log.debug("processing {}",anXmlFile);
        Document dom = buildDOM(anXmlFile.toFile());

        //do stuff with dom...
        //delete original file
        //and finally ...
        dom.normalize(); //so we get a more predictable order

        Transformer transformer = transformerFactory.newTransformer();
        transformer.setOutputProperty(OutputKeys.ENCODING,"UTF-8");
        transformer.setOutputProperty(OutputKeys.INDENT,"yes");
        Source source = new DOMSource(dom);
        Result result = new StreamResult(anXmlFile.toFile());
        transformer.transform(source, result);
    } catch (Exception e) {
        throw new IllegalStateException(e);
    }
}

my problem is that if i have a multi-line comment on the xml that opens in a certain line and closes in a following line (note the line break characters):

<!-- this is a long comment[cr][lf] 
     that spans 2 lines -->

than after I write out the modified DOM the result would be:

<!-- this is a long comment[cr] 
     that spans 2 lines -->

the problem is that [cr][lf] turned into [cr]. this is the only part of the xml affected in this way. all other line endings are the same as the original ([cr][lf]) - even those i've modified (my code doesnt change the comment nodes in the DOM).

Is there any configuration option I can give to the Transformer I create to avoid this? this is all done using JDK classes, no xml libraries involved.

like image 394
radai Avatar asked Jan 24 '13 06:01

radai


1 Answers

The XML specification puts a requirement on XML processors (parsers) to replace \r\n or just \r with a single \n. So if you inspect your DOM text nodes, you will see that you only have \n as line endings.

When serializing the DOM tree, most implementations use the platform default when writing line breaks that occur in character data, or they give you an option to explicitly set the end-of-line string. However, comment text is not character data; the characters are just written as they are without any other processing. At least, this is how most serializers behave.

If it is terribly important, you could switch to JDOM and extend the AbstractXMLOutputProcessor to change the way comments a written.

like image 157
forty-two Avatar answered Oct 06 '22 02:10

forty-two