I'm using Storm 0.8.1 to read incoming messages off an Amazon SQS queue and am getting consistent exceptions when doing so:
2013-12-02 02:21:38 executor [ERROR] java.lang.RuntimeException: com.amazonaws.AmazonClientException: Unable to unmarshall response (ParseError at [row,col]:[1,1] Message: JAXP00010001: The parser has encountered more than "64000" entity expansions in this document; this is the limit imposed by the JDK.) at REDACTED.spouts.SqsQueueSpout.handleNextTuple(SqsQueueSpout.java:219) at REDACTED.spouts.SqsQueueSpout.nextTuple(SqsQueueSpout.java:88) at backtype.storm.daemon.executor$fn__3976$fn__4017$fn__4018.invoke(executor.clj:447) at backtype.storm.util$async_loop$fn__465.invoke(util.clj:377) at clojure.lang.AFn.run(AFn.java:24) at java.lang.Thread.run(Thread.java:701) Caused by: com.amazonaws.AmazonClientException: Unable to unmarshall response (ParseError at [row,col]:[1,1] Message: JAXP00010001: The parser has encountered more than "64000" entity expansions in this document; this is the limit imposed by the JDK.) at com.amazonaws.http.AmazonHttpClient.handleResponse(AmazonHttpClient.java:524) at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:298) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:167) at com.amazonaws.services.sqs.AmazonSQSClient.invoke(AmazonSQSClient.java:812) at com.amazonaws.services.sqs.AmazonSQSClient.receiveMessage(AmazonSQSClient.java:575) at REDACTED.spouts.SqsQueueSpout.handleNextTuple(SqsQueueSpout.java:191) ... 5 more Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1] Message: JAXP00010001: The parser has encountered more than "64000" entity expansions in this document; this is the limit imposed by the JDK. at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.setInputSource(XMLStreamReaderImpl.java:219) at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.<init>(XMLStreamReaderImpl.java:189) at com.sun.xml.internal.stream.XMLInputFactoryImpl.getXMLStreamReaderImpl(XMLInputFactoryImpl.java:277) at com.sun.xml.internal.stream.XMLInputFactoryImpl.createXMLStreamReader(XMLInputFactoryImpl.java:129) at com.sun.xml.internal.stream.XMLInputFactoryImpl.createXMLEventReader(XMLInputFactoryImpl.java:78) at com.amazonaws.http.StaxResponseHandler.handle(StaxResponseHandler.java:85) at com.amazonaws.http.StaxResponseHandler.handle(StaxResponseHandler.java:41) at com.amazonaws.http.AmazonHttpClient.handleResponse(AmazonHttpClient.java:503) ... 10 more
I've debugged the data on the queue and everything looks good. I can't figure out why the API's XML response would be causing these problems. Any ideas?
It sounds like a worker is failing to correctly process the messages from the queue. When a worker (or app) retrieves a message from the queue, it needs to call DeleteMessage() when it has finished processing. This removes it from the queue.
To purge a queue, log in to the AWS Management Console and choose Amazon SQS. Then, select a queue, and choose “Purge Queue” from the Queue Actions menu. The queue will then be cleared of all messages. You can also purge queues using the AWS SDKs or command-line tools.
SQS lets you decouple application components so that they run and fail independently, increasing the overall fault tolerance of the system. Multiple copies of every message are stored redundantly across multiple Availability Zones so that they are available whenever needed.
To prevent other consumers from processing the message again, Amazon SQS sets a visibility timeout, a period of time during which Amazon SQS prevents other consumers from receiving and processing the message. The default visibility timeout for a message is 30 seconds. The minimum is 0 seconds. The maximum is 12 hours.
Answering my own question here for the ages.
There's currently an XML expansion limit processing bug in Oracle and OpenJDK's Java that results in a shared counter hitting the default upper bound when parsing multiple XML documents.
Although I thought that our version (6b27-1.12.6-1ubuntu0.12.04.4) wasn't affected, running the sample code given in the OpenJDK bug report did indeed verify that we were susceptible to the bug.
To work around the issue, I needed to pass jdk.xml.entityExpansionLimit=0
to the Storm workers. By adding the following to storm.yaml
across my cluster, I was able to mitigate this problem.
supervisor.childopts: "-Djdk.xml.entityExpansionLimit=0" worker.childopts: "-Djdk.xml.entityExpansionLimit=0"
I should note that this technically opens you up to a Denial of Service attack, but since our XML documents are only coming from SQS, I'm not worried about someone forging malevolent XML to kill our workers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With