Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

DocumentBuilder parsing breaks string when hits '&'

i have this xml:
<user>
<name>H &amp; M</name>

and i parse it using this code:


    DocumentBuilder documentBuilder = null;
            Document document = null;

        try {
            documentBuilder = DocumentBuilderFactory.newInstance()
            .newDocumentBuilder();
            document = documentBuilder.parse(is);

        } catch (Exception e) {
            return result;
        }

        NodeList nl = document.getElementsByTagName(XML_RESPONSE_ROOT);
        if (nl.getLength() > 0) {
            resp_code = nl.item(0).getAttributes().getNamedItem(
                    XML_RESPONSE_STATUS).getNodeValue();

            if (resp_code.equals(RESP_CODE_OK_SINGLE)) {
                nl = document
                .getElementsByTagName(XML_RESPONSE_TAG_CONTACT);
                NodeList values = nl.item(i).getChildNodes();

etc..

when i get the node value by: node.getNodeValue();

i get only what's before the ampersand, even though the ampersand is escaped

i want to get the whole string: "H & M"

thanks

like image 836
Or Arbel Avatar asked Oct 11 '22 14:10

Or Arbel


1 Answers

It depends on how your XML document was constructed. In particular, it can have multiple adjucent Text nodes in "H & M" while your code expects it to be just one. Try to use nodeVariable.normalize() before getting its value.

According to DOM parser API: "normalize() - Puts all Text nodes in the full depth of the sub-tree underneath this Node, including attribute nodes, into a "normal" form where only structure (e.g., elements, comments, processing instructions, CDATA sections, and entity references) separates Text nodes, i.e., there are neither adjacent Text nodes nor empty Text nodes..."

like image 53
mazaneicha Avatar answered Oct 15 '22 10:10

mazaneicha