Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I drop an inbound XML element with a Transform in CXF?

I'm using CXF (v2.7.10) in a client that is consuming the MS Exchange Web Service (EWS).

I'm finding that one of the elements returned by EWS (UniqueHash) contains characters that are invalid in XML v1.0. As I have no control over this I'm trying to use an inbound interceptor to drop the UniqueHash elements (I don't need them) like this:

Map<String, String> inTransformMap = Collections.singletonMap(
        "{http://schemas.microsoft.com/exchange/services/2006/types}UniqueHash", "");
TransformInInterceptor transformInInterceptor = new TransformInInterceptor();
transformInInterceptor.setInTransformElements(inTransformMap);
client.getInInterceptors().add(transformInInterceptor);

I can see that the transform (TransformInInterceptor) is running nice and early (post-stream):

FINE: Chain org.apache.cxf.phase.PhaseInterceptorChain@be78549 was created. Current flow:
  receive [PolicyInInterceptor, LoggingInInterceptor, AttachmentInInterceptor]
  post-stream [TransformInInterceptor, StaxInInterceptor]
  read [WSDLGetInterceptor, ReadHeadersInterceptor, SoapActionInInterceptor, StartBodyInterceptor]
  pre-protocol [MustUnderstandInterceptor]
  post-protocol [CheckFaultInterceptor, JAXBAttachmentSchemaValidationHack]
  unmarshal [DocLiteralInInterceptor, SoapHeaderInterceptor]
  post-logical [WrapperClassInInterceptor]
  pre-invoke [SwAInInterceptor, HolderInInterceptor]

But even though it appears to be working as intended stepping through the code, when DocLiteralInInterceptor fires later on it throws this unmarshalling error (0x4 in this case is within the UniqueHash element I thought I'd dropped):

org.apache.cxf.interceptor.Fault: Unmarshalling Error: Illegal character entity: expansion character (code 0x4
 at [row,col {unknown-source}]: [1,2230] 
    at org.apache.cxf.jaxb.JAXBEncoderDecoder.unmarshall(JAXBEncoderDecoder.java:881)
    at org.apache.cxf.jaxb.JAXBEncoderDecoder.unmarshall(JAXBEncoderDecoder.java:702)
    at org.apache.cxf.jaxb.io.DataReaderImpl.read(DataReaderImpl.java:160)
    at org.apache.cxf.interceptor.DocLiteralInInterceptor.handleMessage(DocLiteralInInterceptor.java:192)
    at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:272)
    at org.apache.cxf.endpoint.ClientImpl.onMessage(ClientImpl.java:835)
    at org.apache.cxf.transport.http.HTTPConduit$WrappedOutputStream.handleResponseInternal(HTTPConduit.java:1614)
    at org.apache.cxf.transport.http.HTTPConduit$WrappedOutputStream.handleResponse(HTTPConduit.java:1504)
    at org.apache.cxf.transport.http.HTTPConduit$WrappedOutputStream.close(HTTPConduit.java:1310)
    at org.apache.cxf.transport.http.asyncclient.AsyncHTTPConduit$AsyncWrappedOutputStream.close(AsyncHTTPConduit.java:381)
    at org.apache.cxf.io.CacheAndWriteOutputStream.postClose(CacheAndWriteOutputStream.java:50)
    at org.apache.cxf.io.CachedOutputStream.close(CachedOutputStream.java:223)
    at org.apache.cxf.transport.AbstractConduit.close(AbstractConduit.java:56)
    at org.apache.cxf.transport.http.HTTPConduit.close(HTTPConduit.java:628)
    at org.apache.cxf.interceptor.MessageSenderInterceptor$MessageSenderEndingInterceptor.handleMessage(MessageSenderInterceptor.java:62)
    at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:272)
    at org.apache.cxf.endpoint.ClientImpl.doInvoke(ClientImpl.java:565)
    at org.apache.cxf.endpoint.ClientImpl.invoke(ClientImpl.java:474)
    at org.apache.cxf.endpoint.ClientImpl.invoke(ClientImpl.java:377)
    at org.apache.cxf.endpoint.ClientImpl.invoke(ClientImpl.java:330)
    at org.apache.cxf.frontend.ClientProxy.invokeSync(ClientProxy.java:96)
    at org.apache.cxf.jaxws.JaxWsClientProxy.invoke(JaxWsClientProxy.java:135)
    at com.sun.proxy.$Proxy67.searchMailboxes(Unknown Source)
Caused by: javax.xml.bind.UnmarshalException

Here's the XML response I'm working with:

<?xml version="1.0" encoding="utf-8"?>
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<s:Header>
<h:ServerVersionInfo MajorVersion="15" MinorVersion="0" MajorBuildNumber="847" MinorBuildNumber="31" Version="V2_8" xmlns:h="http://schemas.microsoft.com/exchange/services/2006/types" xmlns="http://schemas.microsoft.com/exchange/services/2006/types" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"/>
</s:Header>
<s:Body xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<m:SearchMailboxesResponse xmlns:m="http://schemas.microsoft.com/exchange/services/2006/messages" xmlns:t="http://schemas.microsoft.com/exchange/services/2006/types">
<m:ResponseMessages>
<m:SearchMailboxesResponseMessage ResponseClass="Success">
<m:ResponseCode>NoError</m:ResponseCode>
<m:SearchMailboxesResult>
<t:SearchQueries>
<t:MailboxQuery>
<t:Query>"general quarters"</t:Query>
<t:MailboxSearchScopes>
<t:MailboxSearchScope>
<t:Mailbox>/o=First Organization/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=6f8abfc1a1694cf299c7b3ae5522d8c4-John</t:Mailbox>
<t:SearchScope>All</t:SearchScope>
</t:MailboxSearchScope>
</t:MailboxSearchScopes>
</t:MailboxQuery>
</t:SearchQueries>
<t:ResultType>PreviewOnly</t:ResultType>
<t:ItemCount>1</t:ItemCount>
<t:Size>3169</t:Size>
<t:PageItemCount>1</t:PageItemCount>
<t:PageItemSize>3169</t:PageItemSize>
<t:Items>
<t:SearchPreviewItem>
<t:Id Id="AAMkADY4MDY1MWViLTMzMWItNDEyYi1iMjUzLTQ2ZjMwNWVkYmIzYQBGAAAAAABkY13xq9IqS5OySCQXk7W3BwC9AjA7QbibQa9DQZUO2Dm3AAAAAAAMAAC9AjA7QbibQa9DQZUO2Dm3AAAE/bU4AAA=" ChangeKey="CQAAABYAAAC9AjA7QbibQa9DQZUO2Dm3AAAE/ceM"/>
<t:Mailbox>
<t:MailboxId>/o=First Organization/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=6f8abfc1a1694cf299c7b3ae5522d8c4-John</t:MailboxId>
<t:PrimarySmtpAddress>[email protected]</t:PrimarySmtpAddress>
</t:Mailbox>
<t:ParentId Id="AQMkADY4MDY1MWViLTMzADFiLTQxMmItYjI1My00NmYzMDVlZGJiADNhAC4AAANkY13xq9IqS5OySCQXk7W3AQC9AjA7QbibQa9DQZUO2Dm3AAADDAAAAA==" ChangeKey="AQAAAA=="/>
<t:ItemClass>IPM.Note</t:ItemClass>
<t:UniqueHash>00036&lt;[email protected]&gt;0010General Quarters000C&#x4;&#x0;&#x0;&#x0;?&#x19;"&#x10;&#x12;{&#xB;&#x17;</t:UniqueHash>
<t:SortValue>001B2014-03-11T19:42:42.00000000000006F00001182</t:SortValue>
<t:OwaLink>https://ex01.internal.local/owa/integrated/?viewmodel=ItemReadingPaneViewModelPopOutFactory&amp;IsDiscoveryView=1&amp;exsvurl=1&amp;ItemID=AAMkADY4MDY1MWViLTMzMWItNDEyYi1iMjUzLTQ2ZjMwNWVkYmIzYQBGAAAAAABkY13xq9IqS5OySCQXk7W3BwC9AjA7QbibQa9DQZUO2Dm3AAAAAAAMAAC9AjA7QbibQa9DQZUO2Dm3AAAE%2FbU4AAA%3D</t:OwaLink>
<t:Sender>John Smith</t:Sender>
<t:ToRecipients>
<t:SmtpAddress>Our Shared Mailbox</t:SmtpAddress>
</t:ToRecipients>
<t:CreatedTime>2014-03-11T19:42:42Z</t:CreatedTime>
<t:ReceivedTime>2014-03-11T19:42:42Z</t:ReceivedTime>
<t:SentTime>2014-03-11T19:42:42Z</t:SentTime>
<t:Subject>General Quarters</t:Subject>
<t:Size>3169</t:Size>
<t:Preview/>
<t:Importance>Normal</t:Importance>
<t:Read>true</t:Read>
<t:HasAttachment>false</t:HasAttachment>
</t:SearchPreviewItem>
</t:Items>
<t:MailboxStats>
<t:MailboxStat>
<t:MailboxId>/o=First Organization/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=6f8abfc1a1694cf299c7b3ae5522d8c4-John</t:MailboxId>
<t:DisplayName>John Smith</t:DisplayName>
<t:ItemCount>1</t:ItemCount>
<t:Size>3169</t:Size>
</t:MailboxStat>
</t:MailboxStats>
</m:SearchMailboxesResult>
</m:SearchMailboxesResponseMessage>
</m:ResponseMessages>
</m:SearchMailboxesResponse>
</s:Body>
</s:Envelope>

Does anyone know what I'm doing wrong here? Any pointers on how I get rid of this element and it's troublesome content?

like image 225
BigMikeW Avatar asked Mar 17 '14 08:03

BigMikeW


2 Answers

Figured it out (and thanks to Daniel Kulp for confirming it via the cxf users mailing list).

The issue was that the InTransformReader extends DepthXMLStreamReader. This means that even though I was trying to drop or replace invalid characters, the TransformInInterceptor would first attempt to unmarshall them anyway.

The solution was to create a new Interceptor that extended AbstractPhaseInterceptor and filter out the invalid text using a regex during the PRE_STREAM phase, before the StaxInInterceptor was invoked.

Easy once you know how!

Example:

The following will remove invalid XML chars from a soap message:

import org.apache.cxf.interceptor.Fault;
import org.apache.cxf.message.Message;
import org.apache.cxf.phase.AbstractPhaseInterceptor;
import org.apache.cxf.phase.Phase;

import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;

import org.apache.commons.io.IOUtils;
import org.apache.cxf.io.CachedOutputStream;

public class InvalidCharInterceptor extends AbstractPhaseInterceptor<Message> {

  public InvalidCharInterceptor() {
    super(Phase.PRE_STREAM);
  }

  /**
   * From xml spec valid chars:<br>
   * #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]<br>
   * any Unicode character, excluding the surrogate blocks, FFFE, and FFFF.<br>
   * 
   * @param text
   *          The String to clean
   * @param replacement
   *          The string to be substituted for each match
   * @return The resulting String
   */
  public static String cleanInvalidXmlChars(String text, String replacement) {
    String re = "[^\\x09\\x0A\\x0D\\x20-\\xD7FF\\xE000-\\xFFFD\\x10000-x10FFFF]";
    return text.replaceAll(re, replacement);
  }

  @Override
  public void handleMessage(Message message) throws Fault {
    boolean isOutbound = false;
    isOutbound = message == message.getExchange().getOutMessage()
        || message == message.getExchange().getOutFaultMessage();

    if (isOutbound) {
      OutputStream os = message.getContent(OutputStream.class);

      CachedOutputStream cs = new CachedOutputStream();
      message.setContent(OutputStream.class, cs);

      message.getInterceptorChain().doIntercept(message);

      try {
        cs.flush();
        IOUtils.closeQuietly(cs);
        CachedOutputStream csnew = (CachedOutputStream) message.getContent(OutputStream.class);

        String currentEnvelopeMessage = IOUtils.toString(csnew.getInputStream(), "UTF-8");
        csnew.flush();
        IOUtils.closeQuietly(csnew);

        String res = cleanInvalidXmlChars(currentEnvelopeMessage, "");
        res = res != null ? res : currentEnvelopeMessage;

        InputStream replaceInStream = IOUtils.toInputStream(res, "UTF-8");

        IOUtils.copy(replaceInStream, os);
        replaceInStream.close();
        IOUtils.closeQuietly(replaceInStream);

        os.flush();
        message.setContent(OutputStream.class, os);
        IOUtils.closeQuietly(os);

      } catch (IOException ioe) {
        throw new RuntimeException(ioe);
      }
    }
  }

}

Then you add it to your client:

client.getOutInterceptors().add(new InvalidCharInterceptor());

like image 59
BigMikeW Avatar answered Nov 14 '22 22:11

BigMikeW


OK, just in case it helps someone, here is how I solve my problem, that seems to fit the OP on inbound message containing invalid chars. Actually, the accepted answers did not work in my case, because it seems to handle outbound traffic !?

Here is how I rewrote the code to make it work in my case (inbound SOAP WS response containing invalid chars, that triggers unmarshalling exception.

import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.charset.StandardCharsets;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import org.apache.cxf.interceptor.Fault;
import org.apache.cxf.message.Message;
import org.apache.cxf.phase.AbstractPhaseInterceptor;
import org.apache.cxf.phase.Phase;

import com.google.common.io.ByteStreams;

/**
 * This Adapter is meant to intercept XML response and remove invalid characters
 */
public class InvalidCharInterceptor extends AbstractPhaseInterceptor<Message> {

    // https://stackoverflow.com/a/4237934/2143734

    // Valid Characters in XML 1.0
    // #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
    // now we create a regexp filtering the opposite, i.e. all invalid char intervals
    private static final Pattern XML_1_0_FORBIDDEN_CHARS_PATTERN = Pattern
        .compile("[^" + "\u0009\r\n" + "\u0020-\uD7FF" + "\uE000-\uFFFD" + "\ud800\udc00-\udbff\udfff" + "]");

    /**
     * used by cxf.xml to create the interceptor bean
     * <a href="http://cxf.apache.org/docs/interceptors.html">It register to the RECEIVE phase in cxf flow</a>
     */
    public InvalidCharInterceptor() {
        super(Phase.RECEIVE);
    }

    /**
     * Remove all XML 1.0 invalid characters from the String
     * 
     * @param s
     *            the string to process
     * @return the purged string
     */
    private static String removeForbiddenChars(String s) {
        Matcher m = XML_1_0_FORBIDDEN_CHARS_PATTERN.matcher(s);
        return m.replaceAll("");
    }

    @Override
    public void handleMessage(Message message) {
        boolean isInbound = message == message.getExchange().getInMessage()
            || message == message.getExchange().getInFaultMessage();

        if (isInbound) {
            try (InputStream is = message.getContent(InputStream.class);) {
                if (is != null) {
                    byte[] rawContent = ByteStreams.toByteArray(is);
                    String content = new String(rawContent, StandardCharsets.UTF_8);
                    String cleanContent = removeForbiddenChars(content);
                    message.setContent(InputStream.class,
                        new ByteArrayInputStream(cleanContent.getBytes(StandardCharsets.UTF_8)));
                }
            } catch (IOException e) {
                throw new Fault(e);
            }
        }
    }
}

I then use this class in the cxf definition

  <bean id="invalidCharInterceptor" class="my.package.InvalidCharInterceptor" />

  <bean id="cxf" class="org.apache.cxf.bus.spring.SpringBus">
    <property name="inInterceptors">
      <ref bean="invalidCharInterceptor" />
    </property>
  </bean>
like image 24
Zzirconium Avatar answered Nov 14 '22 20:11

Zzirconium