Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Latest Open JDK 8 JAXB library fails to unmarshal objects with properties that contain new line characters

I am using Java on Ubuntu 16.04. Recently I upgraded to Open JDK java version "1.8.0_161" installed using the oracle-java8-installer package (package version 8u161-1~webupd8~0). Since doing this upgrade , I am getting new exceptions when doing JAXB marshalling of Java objects.

Specifically, when attempting to use JAXB to marshal a Java object to XML I get the following exception if the Java object has a String property that contains any newline ("\n") characters and that String property is being serialized as element content in the XML. (As an aside, if the String property is serialized as attribute content, any newline character in the value of the String is converted to a space character and the exception is not triggered.)

What appears to be happening is that

com.sun.xml.internal.bind.v2.runtime.output.XMLStreamWriterOutput$NewLineEscapeHandler.escape

converts the newline character in the String property of the Java object to the entity reference 
. This entity reference is then written out to the XML output stream but when verifying the entity reference name, the exception is being thrown because #xa is not being recognised as a valid entity reference name.

Is this the expected behaviour? If so, what should I do to preserve the newline characters in the serialization of the Java object? If not, what should I do to work around this problem?

The relevant part of the stack trace is:

... Caused by: javax.xml.stream.XMLStreamException: Invalid name start character '#' (code 35) (name "#xa")
at com.fasterxml.aalto.out.XmlWriter.throwOutputError(XmlWriter.java:472)
at com.fasterxml.aalto.out.XmlWriter.reportNwfName(XmlWriter.java:383)
at com.fasterxml.aalto.out.ByteXmlWriter.verifyNameComponent(ByteXmlWriter.java:235)
at com.fasterxml.aalto.out.ByteXmlWriter.constructName(ByteXmlWriter.java:181)
at com.fasterxml.aalto.out.WNameTable.findSymbol(WNameTable.java:324)
at com.fasterxml.aalto.out.StreamWriterBase.writeEntityRef(StreamWriterBase.java:615)
at net.galexy.fieldguide.jaxb.CustomXMLStreamWriter.writeEntityRef(CustomXMLStreamWriter.java:198)
at com.sun.xml.internal.bind.v2.runtime.output.XMLStreamWriterOutput$XmlStreamOutWriterAdapter.writeEntityRef(XMLStreamWriterOutput.java:277)
at com.sun.xml.internal.bind.v2.runtime.output.XMLStreamWriterOutput$NewLineEscapeHandler.escape(XMLStreamWriterOutput.java:242)
... 60 more

For example, if I unmarshall the following XML:

<?xml version='1.0' encoding='UTF-8'?>
<description>
   <note>The text of the note</note>
</description>

and then attempt to marshall it back to XML then no exception is thrown.

If, however, there is a new line in the middle of the note content:

<?xml version='1.0' encoding='UTF-8'?>
<description>
   <note>The text of
         the note</note>
</description>

Then the exception is thrown.

The JAXB context that is being used is com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl.

The JAXB marshaller that is being used is com.sun.xml.internal.bind.v2.runtime.MarshallerImpl

Looking for more information on the changes, I came across the following bug report that suggests others have encountered the same change with this release of JAXB:

JDK-8196491 Newlines in JAXB string values of SOAP-requests are escaped to "&#xa;"

The answer to this stack overflow question suggests that I can resume control over character escaping by getting my marshaller to use a custom implementation of com.sun.xml.bind.marshaller.CharacterEscapeHandler.

That is puzzling me because javax.xml.bind.Marshaller does not appear to declare a static property name com.sun.xml.bind.marshaller.CharacterEscapeHandler while it does declare other property names like Marshaller.JAXB_FORMATTED_OUTPUT, which equals "jaxb.formatted.output.

Even if I could instruct the marshaller to use my custom character escape handler, I am not totally sure what I should be doing within that escape handler. Is there an appropriate base escape handler that I can override to inherit all of the standard escape handling which ensuring that I intervene to stop escaping of the newline characters?

I have also tried Oracle Java 9 (package version 9.0.4-1~webupd8~0) and that version of Java has the same issues.

I have also tried the next release of Oracle Java 8 (1.8.0_162) and that version has the same issues.

Downloading an older version of Java from the Oracle website (1.8.0_152) sorts out the problem but is not a satisfactory way of resolving the problem.

like image 648
Geoff S Avatar asked Feb 04 '18 01:02

Geoff S


1 Answers

In my case, I'm using JAXB to convert a few objects into XML and serialise them to a file, via StAX/WoodStox. I've managed to fix the problem at issue by filtering the XML that is being serialised. In detail, the approach is like:

  1. Define a custom StreamWriter2Delegate, override writeEntityRef(), so that, when this method receives the wrong entity code (#xd or #xa), it invokes its delegate to actually write back the original character (i.e., \n or \r), which doesn't actually need to be escaped:

    @Override
    public void writeEntityRef ( String eref ) throws XMLStreamException
    {
        if ( eref == null || !eref.startsWith ( "#x" ) ) {
            super.writeEntityRef ( eref );
            return;
        }
        String hex = eref.substring ( 2 );
        for ( char c: new char[] { '\r', '\n' } )
            if ( Integer.toHexString ( c ).equals ( hex ) ) {
                this.writeCharacters ( Character.toString ( c ) );
                return;
        }
        super.writeEntityRef ( eref );
    }
    

This is equivalent (apart from some overhead) to the fix they've already filed for this problem, which should be available with JDK8u192 (and should already be in JDK 9/10).

  1. Wrap your XMLStreamWriter2 with the above filter, for instance:

    FileOutputStream fout = new FileOutputStream ( "test.xml" );
    WstxOutputFactory wsof = (WstxOutputFactory) WstxOutputFactory.newInstance();
    XMLStreamWriter2 xmlOut = (XMLStreamWriter2) wsof.createXMLStreamWriter ( fout, CharsetNames.CS_UTF8 );
    xmlOut = new NewLineFixWriterFilter ( xmlOut );
    // Now write into xmlOut, directly or via JAXB
    

The complete/production code is here. It shouldn't be difficult to adapt the same approach to similar pipelines (in general, the problem occurs because com.sun.xml.internal.bind.v2.runtime.output.XMLStreamWriterOutput escapes \n and \r the wrong way, so the trick is to hijack this wrong encoding from the upper levels).

like image 61
zakmck Avatar answered Oct 18 '22 08:10

zakmck