Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's causing these ParseError exceptions when reading off an AWS SQS queue in my Storm cluster

Tags:

I'm using Storm 0.8.1 to read incoming messages off an Amazon SQS queue and am getting consistent exceptions when doing so:

2013-12-02 02:21:38 executor [ERROR]  java.lang.RuntimeException: com.amazonaws.AmazonClientException: Unable to unmarshall response (ParseError at [row,col]:[1,1] Message: JAXP00010001: The parser has encountered more than "64000" entity expansions in this document; this is the limit imposed by the JDK.)         at REDACTED.spouts.SqsQueueSpout.handleNextTuple(SqsQueueSpout.java:219)         at REDACTED.spouts.SqsQueueSpout.nextTuple(SqsQueueSpout.java:88)         at backtype.storm.daemon.executor$fn__3976$fn__4017$fn__4018.invoke(executor.clj:447)         at backtype.storm.util$async_loop$fn__465.invoke(util.clj:377)         at clojure.lang.AFn.run(AFn.java:24)         at java.lang.Thread.run(Thread.java:701) Caused by: com.amazonaws.AmazonClientException: Unable to unmarshall response (ParseError at [row,col]:[1,1] Message: JAXP00010001: The parser has encountered more than "64000" entity expansions in this document; this is the limit imposed by the JDK.)         at com.amazonaws.http.AmazonHttpClient.handleResponse(AmazonHttpClient.java:524)         at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:298)         at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:167)         at com.amazonaws.services.sqs.AmazonSQSClient.invoke(AmazonSQSClient.java:812)         at com.amazonaws.services.sqs.AmazonSQSClient.receiveMessage(AmazonSQSClient.java:575)         at REDACTED.spouts.SqsQueueSpout.handleNextTuple(SqsQueueSpout.java:191)         ... 5 more Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1] Message: JAXP00010001: The parser has encountered more than "64000" entity expansions in this document; this is the limit imposed by the JDK.         at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.setInputSource(XMLStreamReaderImpl.java:219)         at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.<init>(XMLStreamReaderImpl.java:189)         at com.sun.xml.internal.stream.XMLInputFactoryImpl.getXMLStreamReaderImpl(XMLInputFactoryImpl.java:277)         at com.sun.xml.internal.stream.XMLInputFactoryImpl.createXMLStreamReader(XMLInputFactoryImpl.java:129)         at com.sun.xml.internal.stream.XMLInputFactoryImpl.createXMLEventReader(XMLInputFactoryImpl.java:78)         at com.amazonaws.http.StaxResponseHandler.handle(StaxResponseHandler.java:85)         at com.amazonaws.http.StaxResponseHandler.handle(StaxResponseHandler.java:41)         at com.amazonaws.http.AmazonHttpClient.handleResponse(AmazonHttpClient.java:503)         ... 10 more 

I've debugged the data on the queue and everything looks good. I can't figure out why the API's XML response would be causing these problems. Any ideas?

like image 1000
Joel Rosenberg Avatar asked Dec 09 '13 22:12

Joel Rosenberg


People also ask

Why are messages stuck in SQS?

It sounds like a worker is failing to correctly process the messages from the queue. When a worker (or app) retrieves a message from the queue, it needs to call DeleteMessage() when it has finished processing. This removes it from the queue.

How do I clean my SQS queue?

To purge a queue, log in to the AWS Management Console and choose Amazon SQS. Then, select a queue, and choose “Purge Queue” from the Queue Actions menu. The queue will then be cleared of all messages. You can also purge queues using the AWS SDKs or command-line tools.

Is SQS fault tolerance?

SQS lets you decouple application components so that they run and fail independently, increasing the overall fault tolerance of the system. Multiple copies of every message are stored redundantly across multiple Availability Zones so that they are available whenever needed.

What is the purpose of the SQS message visibility timeout?

To prevent other consumers from processing the message again, Amazon SQS sets a visibility timeout, a period of time during which Amazon SQS prevents other consumers from receiving and processing the message. The default visibility timeout for a message is 30 seconds. The minimum is 0 seconds. The maximum is 12 hours.


1 Answers

Answering my own question here for the ages.

There's currently an XML expansion limit processing bug in Oracle and OpenJDK's Java that results in a shared counter hitting the default upper bound when parsing multiple XML documents.

  1. https://blogs.oracle.com/joew/entry/jdk_7u45_aws_issue_123
  2. https://bugs.openjdk.java.net/browse/JDK-8028111
  3. https://github.com/aws/aws-sdk-java/issues/123

Although I thought that our version (6b27-1.12.6-1ubuntu0.12.04.4) wasn't affected, running the sample code given in the OpenJDK bug report did indeed verify that we were susceptible to the bug.

To work around the issue, I needed to pass jdk.xml.entityExpansionLimit=0 to the Storm workers. By adding the following to storm.yaml across my cluster, I was able to mitigate this problem.

supervisor.childopts: "-Djdk.xml.entityExpansionLimit=0" worker.childopts: "-Djdk.xml.entityExpansionLimit=0" 

I should note that this technically opens you up to a Denial of Service attack, but since our XML documents are only coming from SQS, I'm not worried about someone forging malevolent XML to kill our workers.

like image 172
Joel Rosenberg Avatar answered Oct 31 '22 03:10

Joel Rosenberg