Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading multiple xml documents from a socket in java

Tags:

java

xml

I'm writing a client which needs to read multiple consecutive small XML documents over a socket. I can assume that the encoding is always UTF-8 and that there is optionally delimiting whitespace between documents. The documents should ultimately go into DOM objects. What is the best way to accomplish this?

The essense of the problem is that the parsers expect a single document in the stream and consider the rest of the content junk. I thought that I could artificially end the document by tracking the element depth, and creating a new reader using the existing input stream. E.g. something like:

// Broken 
public void parseInputStream(InputStream inputStream) throws Exception
{
    XMLInputFactory factory = XMLInputFactory.newInstance();
    XMLOutputFactory xof = XMLOutputFactory.newInstance();
    XMLEventFactory eventFactory = XMLEventFactory.newInstance();        
    DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
    Document doc = documentBuilder.newDocument();
    XMLEventWriter domWriter = xof.createXMLEventWriter(new DOMResult(doc));
    XMLStreamReader xmlStreamReader = factory.createXMLStreamReader(inputStream);
    XMLEventReader reader = factory.createXMLEventReader(xmlStreamReader);
    int depth = 0;

    while (reader.hasNext()) {
        XMLEvent evt = reader.nextEvent();
        domWriter.add(evt);

        switch (evt.getEventType()) {
        case XMLEvent.START_ELEMENT:
            depth++;
            break;

        case XMLEvent.END_ELEMENT:
            depth--;

            if (depth == 0) 
            {                       
                domWriter.add(eventFactory.createEndDocument());
                System.out.println(doc);
                reader.close();
                xmlStreamReader.close();

                xmlStreamReader = factory.createXMLStreamReader(inputStream);
                reader = factory.createXMLEventReader(xmlStreamReader);

                doc = documentBuilder.newDocument();
                domWriter = xof.createXMLEventWriter(new DOMResult(doc));    
                domWriter.add(eventFactory.createStartDocument());
            }
            break;                    
        }
    }
}

However running this on input such as <a></a><b></b><c></c> prints the first document and throws an XMLStreamException. Whats the right way to do this?

Clarification: Unfortunately the protocol is fixed by the server and cannot be changed, so prepending a length or wrapping the contents would not work.

like image 410
eaubin Avatar asked May 10 '26 15:05

eaubin


1 Answers

  • Length-prefix each document (in bytes).
  • Read the length of the first document from the socket
  • Read that much data from the socket, dumping it into a ByteArrayOutputStream
  • Create a ByteArrayInputStream from the results
  • Parse that ByteArrayInputStream to get the first document
  • Repeat for the second document etc
like image 68
Jon Skeet Avatar answered May 13 '26 05:05

Jon Skeet