When trying to parse incorrect XML with a character reference such as 
, Java's SAX Parser dies a horrible death with a fatal error such as
org.xml.sax.SAXParseException: Character reference ""
is an invalid XML character.
Is there any way around this? Will I have to clean up the XML file before I hand it off to the SAX Parser? If so, is there an elegant way of going about this?
SAX (Simple API for XML) is an event-driven algorithm for parsing XML documents. SAX is an alternative to the Document Object Model (DOM). Where the DOM reads the whole document to operate on XML, SAX parsers read XML node by node, issuing parsing events while making a step through the input stream.
SAX: the Simple API for XML SAX is an API used to parse XML documents. It is based on events generated while reading through the document. Callback methods receive those events. A custom handler contains those callback methods.
It is better to use StAX parser for creating XML documents rather than using SAX parser. Please refer the Java StAX Parser section for the same.
StAX is a bidirectional API, meaning that it can both read and write XML documents. SAX is read only, so another API is needed if you want to write XML documents. SAX is a push API, whereas StAX is pull. The trade-offs between push and pull APIs outlined above apply here.
Use XML 1.1! skaffman is completely right, but you can just stick <?xml version="1.1"?>
on the top of your files and you'll be in good shape. If you're dealing with streams, write a wrapper that rewrites or adds that processing instruction.
You're going to have to clean up your XML, I'm afraid. Such characters are invalid according to the XML spec, and no amount of persuasion is going to convince the parser otherwise.
Valid XML characters for XML 1.0:
U+0009
U+000A
U+000D
U+0020
– U+D7FF
U+E000
– U+FFFD
U+10000
– U+10FFFF
In order to clean up, you'll have to pass the data through a more low-level processor, which treats it as a unicode character stream, removing those characters that are invalid.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With