Scenario: I'm receiving a huge xml file via extreme slow network so I want so start the excessive processing as early as possible. Because of that I decided to use SAXParser.
I expected that after a tag is finished I will get an event.
The following test shows what I mean:
@Test
public void sax_parser_read_much_things_before_returning_events() throws Exception{
String xml = "<a>"
+ " <b>..</b>"
+ " <c>..</c>"
// much more ...
+ "</a>";
// wrapper to show what is read
InputStream is = new InputStream() {
InputStream is = new ByteArrayInputStream(xml.getBytes());
@Override
public int read() throws IOException {
int val = is.read();
System.out.print((char) val);
return val;
}
};
SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
parser.parse(is, new DefaultHandler(){
@Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
System.out.print("\nHandler start: " + qName);
}
@Override
public void endElement(String uri, String localName, String qName) throws SAXException {
System.out.print("\nHandler end: " + qName);
}
});
}
I wrapped the input stream to see what is read and when the events occur.
What I expected was something like this:
<a> <- output from read()
Handler start: a
<b> <- output from read()
Handler start: b
</b> <- output from read()
Handler end: b
...
Sadly the result was following:
<a> <b>..</b> <c>..</c></a> <- output from read()
Handler start: a
Handler start: b
Handler end: b
Handler start: c
Handler end: c
Handler end: a
Where is my mistake and how can I get the expected result?
Edit:
It seems you are making wrong assumptions about how the I/O works. An XML parser, like most software, will request data in chunks, because requesting single bytes from a stream is a recipe for a performance disaster.
This does not imply that the buffer must get completely filled before a read attempt returns. It’s just, that a ByteArrayInputStream
is incapable of emulating the behavior of a network InputStream
. You can easily fix that by overriding the read(byte[], int, int)
and not returning a complete buffer but, e.g. a single byte on every request:
@Test
public void sax_parser_read_much_things_before_returning_events() throws Exception{
final String xml = "<a>"
+ " <b>..</b>"
+ " <c>..</c>"
// much more ...
+ "</a>";
// wrapper to show what is read
InputStream is = new InputStream() {
InputStream is = new ByteArrayInputStream(xml.getBytes());
@Override
public int read() throws IOException {
int val = is.read();
System.out.print((char) val);
return val;
}
@Override
public int read(byte[] b, int off, int len) throws IOException {
return super.read(b, off, 1);
}
};
SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
parser.parse(is, new DefaultHandler(){
@Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
System.out.print("\nHandler start: " + qName);
}
@Override
public void endElement(String uri, String localName, String qName) throws SAXException {
System.out.print("\nHandler end: " + qName);
}
});
}
This will print
<a>
Handler start: a<b>
Handler start: b..</b>
Handler end: b <c>
Handler start: c..</c>
Handler end: c</a>
Handler end: a?
showing, how the XML parser adapts to the availability of data from the InputStream
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With