While reading an XML file using StAX and XMLStreamReader, I encountered a weird problem. Not sure if its an error or I am doing something wrong. Still learning StAX.
So the problem is,
XMLStreamConstants.CHARACTERS event, when I collect node text as XMLStreamReader.getText() method.ABC & XYZ returns only ABC
Simplified Java Source:
    // Start StaX reader
    XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
    try {
        XMLStreamReader xmlStreamReader = xmlInputFactory.createXMLStreamReader(inStream);
        int event = xmlStreamReader.getEventType();
        while (true) {
            switch (event) {
                case XMLStreamConstants.START_ELEMENT:
                    switch (xmlStreamReader.getLocalName()) {
                        case "group":
                        // Do something
                            break;
                        case "source":
                            isSource = true;
                            break;
                        case "target":
                            isTarget = true;
                            break;
                        default:
                            isSource = false;
                            isTrans = false;
                            break;
                    }
                    break;
                case XMLStreamConstants.CHARACTERS:
                    if (srcData != null) {
                        String srcTrns = xmlStreamReader.getText();
                        if (srcTrns != null) {
                            if (isSource) {
                                // Set source text
                                isSource = false;
                            } else if (isTrans) {
                                // Set target text
                                isTrans = false;
                            }
                        }
                    }
                    break;
                case XMLStreamConstants.END_ELEMENT:
                    if (xmlStreamReader.getLocalName().equals("group")) {
                        // Add to return list
                    }
                    break;
            }
            if (!xmlStreamReader.hasNext()) {
                break;
            }
            event = xmlStreamReader.next();
        }
    } catch (XMLStreamException ex) {
        LOG.log(Level.WARNING, ex.getMessage(), MessageFormat.format("{0} {1}", ex.getCause(), ex.getLocation()));
    }
I am not quite sure what exactly I am doing wrong or how to collect complete text of the node.
Any suggestions or tips would be a great help to move on learning StAX more. :-)
I have solved the problem after struggling and researching a bit.
It was a problem reading text with escaped entity references. You need to set
XMLInputFactory IS_COALESCING to true
XMLInputFactory.setProperty(XMLInputFactory.IS_COALESCING, true);
Basically this tells the parser to replace internal entity references with their respective replacement text (in other words, something like decoding) and read them as normal characters.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With