Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why CXF / JAXB read whole InputStream into memory before marshalling to SOAP message

INFO - Sample code

I've set up sample code (SSCCE) for you to help track the problem:

https://github.com/ljader/test-cxf-base64-marshall

The problem

I'm integrating with 3rd party JAX-WS service, so I cannot change the WSDL.

The 3rd party webservice expects Base64 encoded bytes to perform some operation on them - they expect that client sends whole bytes in SOAP message. They don't want to change to MTOM / XOP, so I'm stuck with current requirements.

I decided to use CXF to easily set up sample client, and it worked ok for small files.

But when I try to send BIG data, i.e. 200MB, the CXF/JAXB throws an exception:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at com.sun.xml.bind.v2.util.ByteArrayOutputStreamEx.readFrom(ByteArrayOutputStreamEx.java:75)
at com.sun.xml.bind.v2.runtime.unmarshaller.Base64Data.get(Base64Data.java:196)
at com.sun.xml.bind.v2.runtime.unmarshaller.Base64Data.writeTo(Base64Data.java:312)
at com.sun.xml.bind.v2.runtime.output.UTF8XmlOutput.text(UTF8XmlOutput.java:312)
at com.sun.xml.bind.v2.runtime.XMLSerializer.leafElement(XMLSerializer.java:356)
at com.sun.xml.bind.v2.model.impl.RuntimeBuiltinLeafInfoImpl$PcdataImpl.writeLeafElement(RuntimeBuiltinLeafInfoImpl.java:191)
at com.sun.xml.bind.v2.runtime.MimeTypedTransducer.writeLeafElement(MimeTypedTransducer.java:96)
at com.sun.xml.bind.v2.runtime.reflect.TransducedAccessor$CompositeTransducedAccessorImpl.writeLeafElement(TransducedAccessor.java:254)
at com.sun.xml.bind.v2.runtime.property.SingleElementLeafProperty.serializeBody(SingleElementLeafProperty.java:130)
at com.sun.xml.bind.v2.runtime.ClassBeanInfoImpl.serializeBody(ClassBeanInfoImpl.java:360)
at com.sun.xml.bind.v2.runtime.XMLSerializer.childAsXsiType(XMLSerializer.java:696)
at com.sun.xml.bind.v2.runtime.ElementBeanInfoImpl$1.serializeBody(ElementBeanInfoImpl.java:155)
at com.sun.xml.bind.v2.runtime.ElementBeanInfoImpl$1.serializeBody(ElementBeanInfoImpl.java:130)
at com.sun.xml.bind.v2.runtime.ElementBeanInfoImpl.serializeBody(ElementBeanInfoImpl.java:332)
at com.sun.xml.bind.v2.runtime.ElementBeanInfoImpl.serializeRoot(ElementBeanInfoImpl.java:339)
at com.sun.xml.bind.v2.runtime.ElementBeanInfoImpl.serializeRoot(ElementBeanInfoImpl.java:75)
at com.sun.xml.bind.v2.runtime.XMLSerializer.childAsRoot(XMLSerializer.java:494)
at com.sun.xml.bind.v2.runtime.MarshallerImpl.write(MarshallerImpl.java:323)
at com.sun.xml.bind.v2.runtime.MarshallerImpl.marshal(MarshallerImpl.java:251)
at javax.xml.bind.helpers.AbstractMarshallerImpl.marshal(AbstractMarshallerImpl.java:95)
at org.apache.cxf.jaxb.JAXBEncoderDecoder.writeObject(JAXBEncoderDecoder.java:617)
at org.apache.cxf.jaxb.JAXBEncoderDecoder.marshall(JAXBEncoderDecoder.java:241)
at org.apache.cxf.jaxb.io.DataWriterImpl.write(DataWriterImpl.java:237)
at org.apache.cxf.interceptor.AbstractOutDatabindingInterceptor.writeParts(AbstractOutDatabindingInterceptor.java:117)
at org.apache.cxf.wsdl.interceptors.BareOutInterceptor.handleMessage(BareOutInterceptor.java:68)
at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)
at org.apache.cxf.endpoint.ClientImpl.doInvoke(ClientImpl.java:514)
at org.apache.cxf.endpoint.ClientImpl.invoke(ClientImpl.java:423)
at org.apache.cxf.endpoint.ClientImpl.invoke(ClientImpl.java:324)
at org.apache.cxf.endpoint.ClientImpl.invoke(ClientImpl.java:277)
at org.apache.cxf.frontend.ClientProxy.invokeSync(ClientProxy.java:96)
at org.apache.cxf.jaxws.JaxWsClientProxy.invoke(JaxWsClientProxy.java:139)

My findings

I've tracked bug, that based on xsd type "base64Binary", the

com.sun.xml.bind.v2.model.impl.RuntimeBuiltinLeafInfoImpl

decides, that

com.sun.xml.bind.v2.runtime.unmarshaller.Base64Data

should handle marshalling of data from

javax.activation.DataHandler

During marshalling, the WHOLE data from underlying InputStream is trying to be read http://grepcode.com/file/repo1.maven.org/maven2/com.sun.xml.bind/jaxb-impl/2.2.11/com/sun/xml/bind/v2/runtime/unmarshaller/Base64Data.java/#311, which causes OOME exception.

Problem

CXF uses JAXB during marshalling Java objects into SOAP messages - when marshalling InputStream, the WHOLE input stream is read to memory before beeing converted into Base64 binary.

So I want to send ("stream") data from client to server in chunks (since the OutputSteam in marshaller is wrapped direct HttpURLConnection), so my client could can handle sending any amount of data.

Especially when many threads would be using my client, the streaming is IMHO very desirable.

I don't have good JAX-WS/CXF/JAXB knowledge, hence the question.

The only materials which I found and may be usefull are:

Can JAXB parse large XML files in chunks

http://rezarahim.blogspot.com/2010/05/chunking-out-big-xml-with-stax-and-jaxb.html

The questions

  1. Why CXF/JAXB loads whole InputStream into memory - is not the DataHandler purpouse to prevent such implementation?

  2. Do you know any way to change JAXB behaviour to differently marshall InputStream?

  3. Do you know different marshallers, which can handle such big data marshalling?

  4. As a last resort, maybe you have links to some materials, how to create custom marshaller which would stream the data directly to the server?

like image 748
ljader Avatar asked Aug 10 '15 19:08

ljader


1 Answers

You don't need any custom marshallers or change JAXB behaviour to achieve what you need - DataHandler is your friend here.

Answering your first question: JAXB needs to keep all data in memory because it has to resolve references.

I know you can't change the WSDL references, etc. But still you do have your client's WSDL in your project in order to generate client classes, don't you? So what you can do (I haven't tested this with third party's WSDL but might be worth trying) is to add xmime:expectedContentTypes="application/octet-stream" into the response XSD element which returns Base64 encoded data. For e.g.:

<xsd:element name="generateBigDataResponse">
    <xsd:complexType>
        <xsd:sequence>
            <xsd:element name="result"
                         type="xsd:base64Binary"
                         minOccurs="0"
                         maxOccurs="1"
                         xmime:expectedContentTypes="application/octet-stream"/>
        </xsd:sequence>
    </xsd:complexType>
</xsd:element>

Also do not forget to add namespace: xmlns:xmime="http://www.w3.org/2005/05/xmlmime" in the xsd:schema element.

What you are doing here - is not changing any WSDL references, just telling JAXB instead of generating byte[] to generate DataHandler. So what happens when you generate your client classes like that:

@Override
public DataHandler generateBigData() {
    try {
        final PipedOutputStream pipedOutputStream = new PipedOutputStream();
        PipedInputStream pipedInputStream = new PipedInputStream(pipedOutputStream);
        InputStreamDataSource dataSource = new InputStreamDataSource(pipedInputStream, "application/octet-stream");

        executor.execute(new Runnable() {

            @Override
            public void run() {
                //write your stuff here into pipedOutputStream
            }
        });

        return new DataHandler(dataSource);
    } catch (IOException e) {
        //handle exception if any
    }
}

You get DataHandler as a response type thanks to xmime. I suggest you use PipedOutputStream, but make sure do the writing in a different thread:

A piped output stream can be connected to a piped input stream to create a communications pipe. The piped output stream is the sending end of the pipe. Typically, data is written to a PipedOutputStream object by one thread and data is read from the connected PipedInputStream by some other thread. Attempting to use both objects from a single thread is not recommended as it may deadlock the thread. The pipe is said to be broken if a thread that was reading data bytes from the connected piped input stream is no longer alive.

Then you connecting it with the PipedInputStream which instance goes into constructor of InputStreamDataSource which you then pass into DataHandler and return DataHandler's instance. This way your file will be written in chunks and you won't get that exception, more - client will never get the timeout.

Hope this helps.

like image 101
Paulius Matulionis Avatar answered Sep 23 '22 15:09

Paulius Matulionis