Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JAVA: gathering byte offsets of xml tags using an XmlStreamReader

Tags:

java

xml

stax

Is there a way to accurately gather the byte offsets of xml tags using the XMLStreamReader?

I have a large xml file that I require random access to. Rather than writing the whole thing to a database, I would like to run through it once with an XMLStreamReader to gather the byte offsets of significant tags, and then be able to use a RandomAccessFile to retrieve the tag content later.

XMLStreamReader doesn't seem to have a way to track character offsets. Instead people recommend attaching the XmlStreamReader to a reader that tracks how many bytes have been read (the CountingInputStream provided by apache.commons.io, for example)

e.g:

CountingInputStream countingReader = new CountingInputStream(new FileInputStream(xmlFile)) ;
XMLStreamReader xmlStreamReader = xmlStreamFactory.createXMLStreamReader(countingReader, "UTF-8") ;


while (xmlStreamReader.hasNext()) {
    int eventCode = xmlStreamReader.next();

    switch (eventCode) {
        case XMLStreamReader.END_ELEMENT :
            System.out.println(xmlStreamReader.getLocalName() + " @" + countingReader.getByteCount()) ;
    }

}
xmlStreamReader.close();

Unfortunately there must be some buffering going on, because the above code prints out the same byte offsets for several tags. Is there a more accurate way of tracking byte offsets in xml files (ideally without resorting to abandoning proper xml parsing)?

like image 839
Dave Avatar asked Oct 22 '25 04:10

Dave


1 Answers

You could use getLocation() on the XMLStreamReader (or XMLEvent.getLocation() if you use XMLEventReader), but I remember reading somewhere that it is not reliable and precise. And it looks like it gives the endpoint of the tag, not the starting location.

I have a similar need to precisely know the location of tags within a file, and I'm looking at other parsers to see if there is one that guarantees to give the necessary level of location precision.

like image 192
Gigatron Avatar answered Oct 23 '25 18:10

Gigatron



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!