Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Xerces-C++ memory

Tags:

c++

xerces-c

I'm having trouble understanding Xerces-C++ memory management.

If I have this (example) XML file "config.xml":

<?xml version="1.0" encoding="UTF-8"?>
<settings>
    <port>
        <reference>Ref1</reference>
        <label>1PPS A</label>
        <enabled>true</enabled>
    </port>
</settings>

and this code:

#include <xercesc/dom/DOM.hpp>

XERCES_CPP_NAMESPACE_USE

DOMElement *nextChildElement(const DOMElement *parent)
{
    DOMNode *node = (DOMNode *)parent->getFirstChild();
    while (node)
    {
        if (node->getNodeType() == DOMNode::ELEMENT_NODE)
            return (DOMElement *)node;
        node = node->getNextSibling();
    }
    return nullptr;
}

int main(int argc, char **argv)
{
    XMLPlatformUtils::Initialize();

    XMLCh tempStr[100];
    XMLString::transcode("LS", tempStr, 99);
    DOMImplementation *impl = DOMImplementationRegistry::getDOMImplementation(tempStr);
    DOMLSParser *parser = ((DOMImplementationLS*)impl)->createLSParser(DOMImplementationLS::MODE_SYNCHRONOUS, 0);
    DOMDocument *doc = impl->createDocument(0, 0, 0);

    doc = parser->parseURI("config.xml");

    DOMElement *el = doc->getDocumentElement(); // <settings>
    el = nextChildElement(el);                  //   <port>
    el = nextChildElement(el);                  //     <reference>Ref1</reference>

    // Heap blows up here
    while (1) {
        char *cstr = XMLString::transcode(el->getTextContent());
        XMLString::release(&cstr); // cstr is "Ref1"
    }

    // and/or here
    while (1) {
        XMLCh *xstr = XMLString::replicate(el->getTextContent());
        char *cstr = XMLString::transcode(xstr); // cstr is "Ref1"
        XMLString::release(&cstr);
        XMLString::release(&xstr);
    }
}

Why does the program (heap) memory blow up in the while (1) loops. Either loop results in the same memory problem:

xerces memory diagnostics

Note: I'm using Visual Studio 2017, and I've tested this in these configurations (all with same results):

  • xerces-c-3.2.1, static lib, x64
  • xerces-c-3.2.1, dynamic (dll), x64
  • xerces-c-3.1.2, static lib, x64
like image 947
Blair Fonville Avatar asked Apr 05 '18 16:04

Blair Fonville


People also ask

Is Xerces thread safe?

Is Xerces DOM implementation thread-safe? No. DOM does not require implementations to be thread safe. If you need to access the DOM from multiple threads, you are required to add the appropriate locks to your application code.


1 Answers

The problem is that function const XMLCh *getTextConent() allocates memory on the Document's heap (using its MemoryManager), and there is no provision to allow the caller to deallocate the memory, or mark it for recycling. So, once the returned pointer is removed from the caller's stack, the memory is essentially orphaned until the entire Document is released, at which time the MemoryManager deletes all heap allocations.

The solution is to not use getTextContent(), but use getNodeValue() instead, which returns a pointer to the data, rather than reallocating it off an internal heap.

Per this (non)-bug report

That aside, getTextContent does not work anyway. It's buggy as all get out and is effectively useless. You can't read the DOM that way or you'll get inaccurate data back under a variety of different circumstances if there are non-adjacent Text nodes (and if there aren't, you don't need to use it anyway since the direct node value will be all you need).

So, a working version of the OP example code might look like this:

#include <xercesc/dom/DOM.hpp>
#include <string>

XERCES_CPP_NAMESPACE_USE

DOMElement *nextChildElement(const DOMElement *parent)
{
    DOMNode *node = (DOMNode *)parent->getFirstChild();
    while (node)
    {
        if (node->getNodeType() == DOMNode::ELEMENT_NODE)
            return (DOMElement *)node;
        node = node->getNextSibling();
    }
    return nullptr;
}

std::string readTextNode(const DOMElement *el)
{
    std::string sstr;
    DOMNode *node = el->getFirstChild();
    if (node->getNodeType() == DOMNode::TEXT_NODE) {
        char *cstr = XMLString::transcode(node->getNodeValue());
        sstr = cstr;
        XMLString::release(&cstr);
    }
    return sstr;
}

int main(int argc, char **argv)
{
    XMLPlatformUtils::Initialize();

    XMLCh tempStr[100];
    XMLString::transcode("LS", tempStr, 99);
    DOMImplementation *impl = DOMImplementationRegistry::getDOMImplementation(tempStr);
    DOMLSParser *parser = ((DOMImplementationLS*)impl)->createLSParser(DOMImplementationLS::MODE_SYNCHRONOUS, 0);
    DOMDocument *doc = impl->createDocument(0, 0, 0);

    doc = parser->parseURI("config.xml");

    DOMElement *el = doc->getDocumentElement(); // <settings>
    el = nextChildElement(el);                  //   <port>
    el = nextChildElement(el);                  //     <reference>Ref1</reference>

    // No memory leak
    std::string nodestr;
    while (1) {
        nodestr = readTextNode(el); // nodestr is "Ref1"
    }
}
like image 159
Blair Fonville Avatar answered Oct 10 '22 04:10

Blair Fonville