I've been beating my head against this absolutely infuriating bug for the last 48 hours, so I thought I'd finally throw in the towel and try asking here before I throw my laptop out the window. I'm trying to parse the response XML from a call I made to AWS SimpleDB. The response is coming back on the wire just fine; for example, it may look like: <pre class="prettyprint"><code><?xml version="1.0" encoding="utf-8"?> <ListDomainsResponse xmlns="http://sdb.amazonaws.com/doc/2009-04-15/"> <ListDomainsResult> <DomainName>Audio</DomainName> <DomainName>Course</DomainName> <DomainName>DocumentContents</DomainName> <DomainName>LectureSet</DomainName> <DomainName>MetaData</DomainName> <DomainName>Professors</DomainName> <DomainName>Tag</DomainName> </ListDomainsResult> <ResponseMetadata> <RequestId>42330b4a-e134-6aec-e62a-5869ac2b4575</RequestId> <BoxUsage>0.0000071759</BoxUsage> </ResponseMetadata> </ListDomainsResponse> </code></pre> I pass in this XML to a parser with <pre class="prettyprint"><code>XMLEventReader eventReader = xmlInputFactory.createXMLEventReader(response.getContent()); </code></pre> and call <code>eventReader.nextEvent();</code> a bunch of times to get the data I want. Here's the bizarre part -- it works great inside the local server. The response comes in, I parse it, everyone's happy. The problem is that when I deploy the code to Google App Engine, the outgoing request still works, and the response XML seems 100% identical and correct to me, but the response fails to parse with the following exception: <pre class="prettyprint"><code>com.amazonaws.http.HttpClient handleResponse: Unable to unmarshall response (ParseError at [row,col]:[1,1] Message: Content is not allowed in prolog.): <?xml version="1.0" encoding="utf-8"?> <ListDomainsResponse xmlns="http://sdb.amazonaws.com/doc/2009-04-15/"><ListDomainsResult><DomainName>Audio</DomainName><DomainName>Course</DomainName><DomainName>DocumentContents</DomainName><DomainName>LectureSet</DomainName><DomainName>MetaData</DomainName><DomainName>Professors</DomainName><DomainName>Tag</DomainName></ListDomainsResult><ResponseMetadata><RequestId>42330b4a-e134-6aec-e62a-5869ac2b4575</RequestId><BoxUsage>0.0000071759</BoxUsage></ResponseMetadata></ListDomainsResponse> javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1] Message: Content is not allowed in prolog. at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(Unknown Source) at com.sun.xml.internal.stream.XMLEventReaderImpl.nextEvent(Unknown Source) at com.amazonaws.transform.StaxUnmarshallerContext.nextEvent(StaxUnmarshallerContext.java:153) ... (rest of lines omitted) </code></pre> I have double, triple, quadruple checked this XML for 'invisible characters' or non-UTF8 encoded characters, etc. I looked at it byte-by-byte in an array for byte-order-marks or something of that nature. Nothing; it passes every validation test I could throw at it. Even stranger, it happens if I use a Saxon-based parser as well -- but ONLY on GAE, it always works fine in my local environment. It makes it very hard to trace the code for problems when I can only run the debugger on an environment that works perfectly (I haven't found any good way to remotely debug on GAE). Nevertheless, using the primitive means I have, I've tried a million approaches including: <ul> <li>XML with and without the prolog</li> <li>With and without newlines</li> <li>With and without the "encoding=" attribute in the prolog</li> <li>Both newline styles</li> <li>With and without the chunking information present in the HTTP stream</li> </ul> And I've tried most of these in multiple combinations where it made sense they would interact -- nothing! I'm at my wit's end. Has anyone seen an issue like this before that can hopefully shed some light on it? Thanks!

The encoding in your XML and XSD (or DTD) are different. XML file header: <code><?xml version='1.0' encoding='utf-8'?></code> XSD file header: <code><?xml version='1.0' encoding='utf-16'?></code> Another possible scenario that causes this is when anything comes before the XML document type declaration. i.e you might have something like this in the buffer: <pre class="prettyprint"><code>helloworld<?xml version="1.0" encoding="utf-8"?> </code></pre> or even a space or special character. There are some special characters called byte order markers that could be in the buffer. Before passing the buffer to the Parser do this... <pre class="prettyprint"><code>String xml = "<?xml ..."; xml = xml.trim().replaceFirst("^([\\W]+)<","<"); </code></pre>

I had issue while inspecting the xml file in notepad++ and saving the file, though I had the top utf-8 xml tag as <code><?xml version="1.0" encoding="utf-8"?></code> Got fixed by saving the file in notpad++ with Encoding(Tab) > Encode in UTF-8:selected (was Encode in UTF-8-BOM)

"Content is not allowed in prolog" when parsing perfectly valid XML on GAE

Tags:

java

parsing

xml

google-app-engine

stax

I've been beating my head against this absolutely infuriating bug for the last 48 hours, so I thought I'd finally throw in the towel and try asking here before I throw my laptop out the window.

I'm trying to parse the response XML from a call I made to AWS SimpleDB. The response is coming back on the wire just fine; for example, it may look like:

<?xml version="1.0" encoding="utf-8"?>  <ListDomainsResponse xmlns="http://sdb.amazonaws.com/doc/2009-04-15/">     <ListDomainsResult>         <DomainName>Audio</DomainName>         <DomainName>Course</DomainName>         <DomainName>DocumentContents</DomainName>         <DomainName>LectureSet</DomainName>         <DomainName>MetaData</DomainName>         <DomainName>Professors</DomainName>         <DomainName>Tag</DomainName>     </ListDomainsResult>     <ResponseMetadata>         <RequestId>42330b4a-e134-6aec-e62a-5869ac2b4575</RequestId>         <BoxUsage>0.0000071759</BoxUsage>     </ResponseMetadata> </ListDomainsResponse>

I pass in this XML to a parser with

XMLEventReader eventReader = xmlInputFactory.createXMLEventReader(response.getContent());

and call eventReader.nextEvent(); a bunch of times to get the data I want.

Here's the bizarre part -- it works great inside the local server. The response comes in, I parse it, everyone's happy. The problem is that when I deploy the code to Google App Engine, the outgoing request still works, and the response XML seems 100% identical and correct to me, but the response fails to parse with the following exception:

com.amazonaws.http.HttpClient handleResponse: Unable to unmarshall response (ParseError at [row,col]:[1,1] Message: Content is not allowed in prolog.): <?xml version="1.0" encoding="utf-8"?>  <ListDomainsResponse xmlns="http://sdb.amazonaws.com/doc/2009-04-15/"><ListDomainsResult><DomainName>Audio</DomainName><DomainName>Course</DomainName><DomainName>DocumentContents</DomainName><DomainName>LectureSet</DomainName><DomainName>MetaData</DomainName><DomainName>Professors</DomainName><DomainName>Tag</DomainName></ListDomainsResult><ResponseMetadata><RequestId>42330b4a-e134-6aec-e62a-5869ac2b4575</RequestId><BoxUsage>0.0000071759</BoxUsage></ResponseMetadata></ListDomainsResponse> javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1] Message: Content is not allowed in prolog.     at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(Unknown Source)     at com.sun.xml.internal.stream.XMLEventReaderImpl.nextEvent(Unknown Source)     at com.amazonaws.transform.StaxUnmarshallerContext.nextEvent(StaxUnmarshallerContext.java:153)     ... (rest of lines omitted)

I have double, triple, quadruple checked this XML for 'invisible characters' or non-UTF8 encoded characters, etc. I looked at it byte-by-byte in an array for byte-order-marks or something of that nature. Nothing; it passes every validation test I could throw at it. Even stranger, it happens if I use a Saxon-based parser as well -- but ONLY on GAE, it always works fine in my local environment.

It makes it very hard to trace the code for problems when I can only run the debugger on an environment that works perfectly (I haven't found any good way to remotely debug on GAE). Nevertheless, using the primitive means I have, I've tried a million approaches including:

XML with and without the prolog
With and without newlines
With and without the "encoding=" attribute in the prolog
Both newline styles
With and without the chunking information present in the HTTP stream

And I've tried most of these in multiple combinations where it made sense they would interact -- nothing! I'm at my wit's end. Has anyone seen an issue like this before that can hopefully shed some light on it?

Thanks!

556

asked Jun 13 '10 02:06

Adrian Petrescu

2 Answers

The encoding in your XML and XSD (or DTD) are different.
XML file header: <?xml version='1.0' encoding='utf-8'?>
XSD file header: <?xml version='1.0' encoding='utf-16'?>

Another possible scenario that causes this is when anything comes before the XML document type declaration. i.e you might have something like this in the buffer:

helloworld<?xml version="1.0" encoding="utf-8"?>

or even a space or special character.

There are some special characters called byte order markers that could be in the buffer. Before passing the buffer to the Parser do this...

String xml = "<?xml ..."; xml = xml.trim().replaceFirst("^([\\W]+)<","<");

195

answered Oct 05 '22 15:10

Romain Hippeau

I had issue while inspecting the xml file in notepad++ and saving the file, though I had the top utf-8 xml tag as <?xml version="1.0" encoding="utf-8"?>

Got fixed by saving the file in notpad++ with Encoding(Tab) > Encode in UTF-8:selected (was Encode in UTF-8-BOM)

answered Oct 05 '22 16:10

techloris_109

Related questions
                            
                                Issue with parsing the content from JSON file with Jackson & message- JsonMappingException -Cannot deserialize as out of START_ARRAY token
                            
                                Why is a static method considered a method?
                            
                                Setting JDK in Eclipse
                            
                                Class has been compiled by a more recent version of the Java Environment
                            
                                Get last element of Stream/List in a one-liner
                            
                                Generic type parameter naming convention for Java (with multiple chars)?
                            
                                What does this symbol mean in IntelliJ? (red circle on bottom-left corner of file name, with 'J' in it)
                            
                                How do I use the new computeIfAbsent function?
                            
                                Is it possible to have empty RequestParam values use the defaultValue?
                            
                                Even though JRE 8 is installed on my MAC -" No Java Runtime present,requesting to install " gets displayed in terminal
                            
                                Detecting Windows or Linux? [duplicate]
                            
                                Using Enums while parsing JSON with GSON
                            
                                How does a PreparedStatement avoid or prevent SQL injection?
                            
                                Print "hello world" every X seconds
                            
                                java.lang.UnsupportedClassVersionError Unsupported major.minor version 51.0 [duplicate]
                            
                                getExtractedText on inactive InputConnection warning on android
                            
                                SQL parser library for Java [closed]
                            
                                Difference between Interceptor and Filter in Spring MVC
                            
                                warning: [options] bootstrap class path not set in conjunction with -source 1.5
                            
                                Why is Cloneable not deprecated?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With