Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How should I deal with linebreaks in strings I want to marshal in Java to XML?

How should I deal with linebreaks in strings I want to marshal to XML?

I am having difficulty using Java and JAXB to handle putting strings in XML files that have linefeeds in them. The data is being pulled from a database with the actual line feed characters in them.

Foo <LF>
bar

Or an additional example:

Foo\r\n\r\nBar

Yields:

Foo&#xD;
&#xD;
Bar

If I just marshal this data into XML, I get literal line feed characters in the output. This is apparently against XML standards where the characters should be encoded to &#xD;. Ie in the XML file output I should see:

Foo &#xD;bar

But if I try and do this manually, I end up with my ampersand getting encoded!

Foo &amp;#xD;bar

This is pretty ironic because the process which is apparently supposed to encode the linebreaks in the first place and is not, is foiling my attempts to encode it manually.

like image 937
deed02392 Avatar asked Aug 07 '13 16:08

deed02392


People also ask

How do you do Unmarshalling and marshalling in Java?

Marshalling is the process of transforming Java objects into XML documents. Unmarshalling is the process of reading XML documents into Java objects. The JAXBContext class provides the client's entry point to the JAXB API. It provides API for marshalling, unmarshalling and validating.

How do you put a line break in XML?

use <br/> ; or.

How does JAXB marshalling work?

In JAXB, marshalling involves parsing an XML content object tree and writing out an XML document that is an accurate representation of the original XML document, and is valid with respect the source schema. JAXB can marshal XML data to XML documents, SAX content handlers, and DOM nodes.

What does Marshal do in Java?

In the Java-related RFC 2713, marshalling is used when serialising objects for remote invocation. An object that is marshalled records the state of the original object and it contains the codebase (codebase here refers to a list of URLs where the object code can be loaded from, and not source code).


1 Answers

Below is an example of JAXB's default behaviour regarding \n and \r:

Java Model (Root)

import javax.xml.bind.annotation.XmlRootElement;

@XmlRootElement
public class Root {

    private String foo;
    private String bar;

    public String getFoo() {
        return foo;
    }

    public void setFoo(String foo) {
        this.foo = foo;
    }

    public String getBar() {
        return bar;
    }

    public void setBar(String bar) {
        this.bar = bar;
    }

}

Demo Code

import javax.xml.bind.*;

public class Demo {

    public static void main(String[] args) throws Exception {
        JAXBContext jc = JAXBContext.newInstance(Root.class);

        Root root = new Root();
        root.setFoo("Hello\rWorld");
        root.setBar("Hello\nWorld");

        Marshaller marshaller = jc.createMarshaller();
        marshaller.marshal(root, System.out);
    }

}

Output

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><root><bar>Hello
World</bar><foo>Hello&#xD;World</foo></root>

UPDATE

Below are some additional details based on some investigation that I did.

Common to All JAXB (JSR-222) Implementations

  • If you are marshalling to an XMLStreamWriter or XMLEventWriter directly (via Marshaller) or indirectly (via potentially a JAX-RS or JAX-WS provider) then the escaping will be based on the StAX implementation. Woodstox appears to escape things correctly, but the StAX implementation in the JDK I'm using did not.

EclipseLink JAXB (MOXy)

  • There is a bug in MOXy related to escaping \r that I am currently in the process of fixing (see: http://bugs.eclipse.org/414608)

JAXB Reference Implementation

  • The JAXB reference implementation will properly escape '\r' when marshalling to an OutputStream, but not to a Writer atleast in the JDK I'm using.
like image 89
bdoughan Avatar answered Sep 28 '22 13:09

bdoughan