Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parse a list of XML fragments with no root element from a stream input

Is it feasible in Java using the SAX api to parse a list of XML fragments with no root element from a stream input?

I tried parsing such an XML but got a

org.xml.sax.SAXParseException: The markup in the document following the root element must be well-formed.

before even the endDocument event was fired.

I would like not to settle with obvious but clumsy solutions as "Pre-append a custom root element or Use buffered fragment parsing".

I am using the standard SAX API of Java 1.6. The SAX factory had setValidating(false) in case anyone wondered.

like image 764
yannisf Avatar asked Jun 27 '12 12:06

yannisf


1 Answers

First, and most important of all, the content you are parsing is not an XML document. From the XML Specification:

[Definition: There is exactly one element, called the root, or document element, no part of which appears in the content of any other element.]

Now, as to parsing this with SAX - in spite of what you said about clumsiness - I'd suggest the following approach:

Enumeration<InputStream> streams = Collections.enumeration(
    Arrays.asList(new InputStream[] {
        new ByteArrayInputStream("<root>".getBytes()),
        yourXmlLikeStream,
        new ByteArrayInputStream("</root>".getBytes()),
    }));

SequenceInputStream seqStream = new SequenceInputStream(streams);

// Now pass the `seqStream` into the SAX parser.

Using the SequenceInputStream is a convenient way of concatenating multiple input streams into a single stream. They will be read in the order they are passed to the constructor (or in this case - returned by the Enumeration).

Pass it to your SAX parser, and you are done.

like image 185
npe Avatar answered Oct 14 '22 04:10

npe